Can Wikipedia Help Offline Reinforcement Learning? (Author Interview)

#wikipedia #reinforcementlearning #languagemodels Original paper review here: Machel Reid and Yutaro Yamada join me to discuss their recent paper on langauge model pre-training for decision transformers in offline reinforcement learning. OUTLINE: 0:00 - Intro 1:00 - Brief paper, setup & idea recap 7:30 - Main experimental results & high standard deviations 10:00 - Why is there no clear winner? 13:00 - Why are bigger models not a lot better? 14:30 - What’s behind the name ChibiT? 15:30 - Why is iGPT underperforming? 19:15 - How are tokens distributed in Reinforcement Learning? 22:00 - What other domains could have good properties to transfer? 24:20 - A deeper dive into the models’ attention patterns 33:30 - Codebase, model sizes, and compute requirements 37:30 - Scaling behavior of pre-trained models 40:05 - What did not work out in this project? 42:00 - How can people get started and where to go next? Paper: Code:
Back to Top