Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos (Paper Explained)
#openai #vpt #minecraft
Minecraft is one of the harder challenges any RL agent could face. Episodes are long, and the world is procedurally generated, complex, and huge. Further, the action space is a keyboard and a mouse, which has to be operated only given the game’s video input. OpenAI tackles this challenge using Video PreTraining, leveraging a small set of contractor data in order to pseudo-label a giant corpus of scraped footage of gameplay. The pre-trained model is highly capable in basic game mechanics and can be fine-tuned much better than a blank slate model. This is the first Minecraft agent that achieves the elusive goal of crafting a diamond pickaxe all by itself.
OUTLINE:
0:00 - Intro
3:50 - How to spend money most effectively?
8:20 - Getting a large dataset with labels
14:40 - Model architecture
19:20 - Experimental results and fine-tuning
25:40 - Reinforcement Learning to the Diamond Pickaxe
30:00 - Final comments and hardware
Blog:
Paper:
Code & Model weights:
Abstract:
Pretraining on noisy, internet-scale datasets has been heavily studied as a technique for training models with broad, general capabilities for text, images, and other modalities. However, for many sequential decision domains such as robotics, video games, and computer use, publicly available data does not contain the labels required to train behavioral priors in the same way. We extend the internet-scale pretraining paradigm to sequential decision domains through semi-supervised imitation learning wherein agents learn to act by watching online unlabeled videos. Specifically, we show that with a small amount of labeled data we can train an inverse dynamics model accurate enough to label a huge unlabeled source of online data -- here, online videos of people playing Minecraft -- from which we can then train a general behavioral prior. Despite using the native human interface (mouse and keyboard at 20Hz), we show that this behavioral prior has nontrivial zero-shot capabilities and that it can be fine-tuned, with both imitation learning and reinforcement learning, to hard-exploration tasks that are impossible to learn from scratch via reinforcement learning. For many tasks our models exhibit human-level performance, and we are the first to report computer agents that can craft diamond tools, which can take proficient humans upwards of 20 minutes (24,000 environment actions) of gameplay to accomplish.
Authors: Bowen Baker, Ilge Akkaya, Peter Zhokhov, Joost Huizinga, Jie Tang, Adrien Ecoffet, Brandon Houghton, Raul Sampedro, Jeff Clune
Links:
Homepage:
Merch:
YouTube:
Twitter:
Discord:
LinkedIn:
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar:
Patreon:
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
12 views
8
3
8 months ago 00:07:20 1
Делаем СВОЮ МОДЕЛЬ ГОЛОСА и AI COVER в Google Colab | Туториал по RVC v2
10 months ago 00:17:09 15
LOS 10 ENTRENAMIENTOS EN 5 DÍAS DE EL MATADOR 🌹👊🏼
10 months ago 00:28:06 1
Rank #1 on Google: 35+ AI SEO Hacks REVEALED...
11 months ago 00:01:27 1
MIDO Pre-Warmup met bal
11 months ago 01:56:20 2
Let’s build GPT: from scratch, in code, spelled out.
12 months ago 00:15:58 1
Free AI Text-To-Speech Voice Cloning – TTS With Any Voice! – Easy AI Voice Cloning – TorToiSe TTS
12 months ago 00:11:27 1
RVC Tutorial - Speak in any voice! - Retrieval-based Voice Conversion - Easy AI Voice Tutorial
1 year ago 00:28:41 1
Quelques GPT’s qui sont entrainés à décrypter les Conspirations
1 year ago 00:10:50 1
10 MINUTE RESISTANCE BAND WORKOUT | Pre-Training & Match Muscle Activation for Footballers
1 year ago 00:21:58 2
“ПАСХА ИЛИ ПЕСАХ? КАКАЯ РОЛЬ МОИСЕЯ?
1 year ago 00:34:12 1
Модель ChatGPT. Как она делает то, что делает? Часть 1.
1 year ago 00:05:53 2
I asked AI to make a Music Video... the results are trippy
1 year ago 00:17:58 1
AI superpowered networks? (NVIDIA and Cisco join forces)
1 year ago 00:09:09 1
Нейросеть Sora которая ГЕНЕРИРУЕТ ВИДЕО от OpenAI
1 year ago 00:10:37 1
How I make ChatGPT Out-perform 10 Human SEO Experts
1 year ago 00:09:17 1
What is ChatGPT? OpenAI’s Chat GPT Explained
1 year ago 00:10:32 2
Карта продающих смыслов | Как увеличить продажи?
2 years ago 00:05:07 1
How to use SDXL with low VRAM ComfyUI | Generate amazing images with Automatic 1111🌟
2 years ago 00:10:45 1
Sodium For Endurance & Speed | A Complete Guide
2 years ago 00:03:11 1
Как зарегистрироваться в чате GPT за 5 минут в России. Самая простая регистрация. Chat GPT от Openai
2 years ago 00:17:21 1
Actual Objects Presents: Voice To Skull
2 years ago 01:02:51 1
Алексей Скрынник | Демонстрации и трансформеры в RL
2 years ago 00:13:43 1
How ChatGPT is Trained
2 years ago 00:12:36 1
Оптимизация для поисковых систем с помощью чата GPT🤖