Archive Projects About

Moe

Published on
February 1, 2026
Re: Implementation [02]: GPT + Mixture of Experts (MoE)
Re:Implementation Deep-Learning GPT MoE
Building upon our basic GPT, we now implement a Sparse Mixture of Experts (MoE) architecture. This allows us to scale up model capacity (parameters) without proportionally increasing computational cost (FLOPs) during inference.