Do Vision Transformers See Like Convolutional Neural Networks? | Paper Explained
In this video I cover the “Do Vision Transformers See Like Convolutional Neural Networks?“ paper. They dissect ViTs and ResNets and show the differences in the features learned as well as what contributes to those differences (like the amount of data used, skip connections, etc.).
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
✅ Paper:
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
⌚️ Timetable:
00:00 Intro
00:45 Contrasting features in ViTs vs CNNs
06:45 Global vs Local receptive fields
13:55 Data matters, mr. obvious
17:40 Contrasting receptive fields
20:30 Data flow through CLS vs spatial tokens
23:30 Skip connections matter a lot in ViTs
24:20 Spatial inform
1 view
26
10
4 weeks ago 00:01:51 22
A Nightmare on Elm Street (2025) Reboot - TEASER TRAILER (4K) Robert Englund, Freddy Krueger
1 month ago 00:15:09 1
SUPER-HERO-BOWL! First Time Watching TOON SANDWICH!
1 month ago 00:01:39 101
indian human pony video 1
1 month ago 00:05:10 1
How to Invest in Presale Crypto and Maximize Your Profits!
1 month ago 00:34:18 1
Why Saudi Arabia is Building a $1 Trillion City in the Desert
1 month ago 00:05:12 1
Yanis Varoufakis: DiEM25 had predicted Europe’s decline by 2025. Plus what we must do next
1 month ago 00:03:38 2
Wonderland Skies - Connie Talbot (Music Video)
2 months ago 00:08:44 1
The Future of Cinema: How AI Filmmaking Is Changing Filmmaking Forever!