Attention is all you need (Transformer) - Model explanation (including math), Inference and Training
A complete explanation of all the layers of a Transformer Model: Multi-Head Self-Attention, Positional Encoding, including all the matrix multiplications and a complete description of the training and inference process.
Slides PDF:
Chapters
00:00 - Intro
01:10 - RNN and their problems
08:04 - Transformer Model
09:02 - Maths background and notations
12:20 - Encoder (overview)
12:31 - Input Embeddings
15:04 - Positional Encoding
20:08 - Single Head Self-Attention
28:30 - Multi-Head Attention
35:39 - Query, Key, Value
37:55 - Layer Normalization
40:13 - Decoder (overview)
42:24 - Masked Multi-Head Attention
44:59 - Training
52:09 - Inference
1 view
1305
286
5 days ago 00:03:12 1
Slay in Style: YMDUCH Ruched High Split Maxi Dress – Chic, Flirty, and Made to Turn Heads! - YouTube
2 weeks ago 00:28:16 1
🌈 Discoveries of Great Tailors. Let’s Expose their Forbidden Tricks! (Part #35)
2 weeks ago 00:32:55 1
💥✅ Mysterious Sewing Techniques. You’ve Been Sewing Wrong All this Time!
2 weeks ago 00:01:37 1
THE MECHANIC 3 - First Trailer | Jason Statham, Gal Gadot, Liam Neeson