This short tutorial covers the basics of the Transformer, a neural network architecture designed for handling sequential data in machine learning.
Timestamps:
0:00 - Intro
1:18 - Motivation for developing the Transformer
2:44 - Input embeddings (start of encoder walk-through)
3:29 - Attention
6:29 - Multi-head attention
7:55 - Positional encodings
9:59 - Add & norm, feedforward, & stacking encoder layers
11:14 - Masked multi-head attention (start of decoder walk-through)
12:35 - Cross-attention
13:38 - Decoder output & prediction probabilities
14:46 - Complexity analysis
16:00 - Transformers as graph neural networks
Original Transformers paper:
Attention is All You Need -
Other papers mentioned:
(GPT-3) Language Models are Few-Shot Learners -
(DALL-E) Zero-Shot Text-to-Image Generation -
BERT: Pre-training of Deep Bidirectional Tran
1 view
24
14
4 weeks ago 00:00:13 1
Да что ты черт побери такое несешь ? (Банды Нью-Йорка)
4 weeks ago 00:04:14 1
Paramore: Decode [OFFICIAL VIDEO]
4 weeks ago 00:00:32 1
…but the people are retarded
4 weeks ago 00:02:36 5
Jingle Bells | Christmas Song | Super Simple Songs
4 weeks ago 00:04:49 1
Play To Earn🔥This New Play to Earn Game is About to Make a Lot of People RICH
4 weeks ago 00:02:41 6
sinking in the deep || Viktor (Arcane)
1 month ago 00:37:34 1
Chorallas - Desert Lambs (1969) [Full Album]
1 month ago 00:02:50 1
We Are Number One but it contains spoilers from Madoka Magica Concept Movie (and Rebellion)
1 month ago 00:04:14 1
The Hunter - Bloodborne (4K UHD 2024)
1 month ago 00:08:38 1
Retired General on How Ukraine Is ‘Bleeding Out’ Against Russia | WSJ
1 month ago 00:19:20 1
Blue Ribbon: Story of the Build
1 month ago 00:03:28 1
Nemo - The Code (LIVE) | Switzerland🇨🇭| Grand Final | Eurovision 2024
1 month ago 01:04:51 18
Half in the Bag: Top 10 Horror Movies (2024) Part 2