Re: Implementation [01]: A Deep Dive into NanoGPT

Re: Implementation Series — Episode 01
Welcome to my open notebook. In this series, I am rebuilding the most influential models in AI history to prepare for my MPhil research. No black boxes, just code and first principles.

Overview

We start where many modern LLM journeys begin: Andrej Karpathy's nanoGPT.

It is the simplest, fastest repository for training/fine-tuning medium-sized GPTs. But merely running it isn't enough. In this post, we will first walk through the theory and then construct the model from scratch based on Andrej Karpathy's implementation.