2. The spelled-out intro to language modeling: building makemore
We implement a bigram character-level language model, which we will further complexify in followup videos into a modern Transformer language model, like GPT. In this video, the focus is on (1) introducing and its subtleties and use in efficiently evaluating neural networks and (2) the overall framework of language modeling that includes model training, sampling, and the evaluation of a loss (e.g. the negative log likelihood for classification).
Links:
- makemore on github:
- jupyter notebook I built in this video:
Useful links for practice:
- Python Numpy tutorial from CS231n . We use instead of in this video. Their design (e.g. broadcasting, data types, etc.) is so similar that practicing one is basically practicing the other, just be careful with some of the APIs - how various functions are named, what arguments they take, etc. - these details can vary.
- PyTorch tutorial on Tensor
- Another PyTorch intro to Tensor
Exercises:
E01: train a trigram language model, i.e. take two characters as an input to predict the 3rd one. Feel free to use either counting or a neural net. Evaluate the loss; Did it improve over a bigram model?
E02: split up the dataset randomly into 80% train set, 10% dev set, 10% test set. Train the bigram and trigram models only on the training set. Evaluate them on dev and test splits. What can you see?
E03: use the dev set to tune the strength of smoothing (or regularization) for the trigram model - i.e. try many possibilities and see which one works best based on the dev set loss. What patterns can you see in the train and dev set loss as you tune this strength? Take the best setting of the smoothing and evaluate on the test set once and at the end. How good of a loss do you achieve?
E04: we saw that our 1-hot vectors merely select a row of W, so producing these vectors explicitly feels wasteful. Can you delete our use of in favor of simply indexing into rows of W?
E05: look up and use instead. You should achieve the same result. Can you think of why we’d prefer to use instead?
E06: meta-exercise! Think of a fun/interesting exercise and complete it.
Chapters:
00:00:00 intro
00:03:03 reading and exploring the dataset
00:06:24 exploring the bigrams in the dataset
00:09:24 counting bigrams in a python dictionary
00:12:45 counting bigrams in a 2D torch tensor (“training the model“)
00:18:19 visualizing the bigram tensor
00:20:54 deleting spurious (S) and (E) tokens in favor of a single . token
00:24:02 sampling from the model
00:36:17 efficiency! vectorized normalization of the rows, tensor broadcasting
00:50:14 loss function (the negative log likelihood of the data under our model)
01:00:50 model smoothing with fake counts
01:02:57 PART 2: the neural network approach: intro
01:05:26 creating the bigram dataset for the neural net
01:10:01 feeding integers into neural nets? one-hot encodings
01:13:53 the “neural net“: one linear layer of neurons implemented with matrix multiplication
01:18:46 transforming neural net outputs into probabilities: the softmax
01:26:17 summary, preview to next steps, reference to micrograd
01:35:49 vectorized loss
01:38:36 backward and update, in PyTorch
01:42:55 putting everything together
01:47:49 note 1: one-hot encoding really just selects a row of the next Linear layer’s weight matrix
01:50:18 note 2: model smoothing as regularization loss
01:54:31 sampling from the neural net
01:56:16 conclusion
19 views
7416
3395
2 weeks ago 00:02:09 0
Sega и Nintendo «возродили» приставки Mega Drive и NES!!!
2 weeks ago 00:11:32 0
2. The Step-by-Step Process of Becoming a Graphic Designer
2 weeks ago 00:09:34 0
10 УДИВИТЕЛЬНЫХ МАШИН, В КОТОРЫХ МОЖНО ЖИТЬ
2 weeks ago 00:14:38 0
AOTU World 【凹凸世界】- Special Episode 2 The Choice of a King (Part 2) (English Subs)
2 weeks ago 00:57:55 20
Wolfenshtein 2 The New Colosus
2 weeks ago 00:10:15 0
КРУТЫЕ ИДЕИ ДЛЯ АПГРЕЙДА АВТОМОБИЛЯ
2 weeks ago 00:00:39 1
MNS-ZJ500 ZJ1000 Orange Screw Juicing Machine
2 weeks ago 00:00:44 0
MNS-PG500 Lamb Goat Pork Chop Cutter Machine
2 weeks ago 00:34:30 8
2. The Twelve Beauty/12 красавиц, дизайн Dome 91001, продолжение
2 weeks ago 00:39:10 3
ФИНАЛ MAX PAYNE 2: THE FALL OF MAX PAYNE (МАКС ПЭЙН 2) #7 // (Прохождение без комментариев)
2 weeks ago 00:10:27 1
НЕВЕРОЯТНО КРУТЫЕ ИЗОБРЕТЕНИЯ, КОТОРЫЕ НАХОДЯТСЯ НА СОВЕРШЕННО НОВОМ УРОВНЕ
2 weeks ago 01:16:04 4
Maddix Live at Luminosity Beach Festival 2025 #LBF25
2 weeks ago 00:10:49 326
Кинопремьеры сентябрь 2025г💪 Что можно посмотреть в кино с 28 августа❓
2 weeks ago 00:06:29 97
Rebellion and Love part 2- The Lion King AU
2 weeks ago 04:43:40 15
Играем в Дракула 2 (Dracula 2: The Last Sanctuary | 2000) | Серия 2
2 weeks ago 00:58:36 0
High-Resolution OS/2: The Sequel (2nd Take)
2 weeks ago 00:10:07 0
6 СОВРЕМЕННЫХ ТРАНСПОРТНЫХ СРЕДСТВ С НЕВЕРОЯТНЫМ ФУНКЦИОНАЛОМ
2 weeks ago 00:10:30 2
8 НЕВЕРОЯТНЫХ СРЕДСТВ ПЕРЕДВИЖЕНИЯ, КОТОРЫЕ ВАМ СТОИТ УВИДЕТЬ
2 weeks ago 00:02:29 8
Fierce fighting near Pokrovsk: “?️brave“ troops massively burn enemy equipment and infantry
2 weeks ago 02:18:17 32
Матрица 4к 60fps
2 weeks ago 02:00:59 589
Hardwell & Armin van Buuren - A State of Trance Episode 1239 Takeover
2 weeks ago 00:36:08 42
СЛИЛ СТАРЫЙ БИЛД В POE 2! Зато новый морозит всех в 4 акте - Besalines
2 weeks ago 00:16:05 840
Вердикт | Destiny 2: The Edge of Fate | Последний гвоздь
2 weeks ago 00:15:45 23
ГЛУБОКАЯ АНАЛИТИКА [Dota 2: The International 2023: Opening Ceremony]