Open Problems in Mechanistic Interpretability: A Whirlwind Tour
A Google TechTalk, presented by Neel Nanda, 2023/06/20
Google Algorithms Seminar - ABSTRACT: Mechanistic Interpretability is the study of reverse engineering the learned algorithms in a trained neural network, in the hopes of applying this understanding to make powerful systems safer and more steerable. In this talk Neel will give an overview of the field, summarise some key works, and outline what he sees as the most promising areas of future work and open problems. This will touch on techniques in casual abstraction and meditation analysis, understanding superposition and distributed representations, model editing, and studying individual circuits and neurons.
About the Speaker: Neel works on the mechanistic interpretability team at Google DeepMind. He previously worked with Chris Olah at Anthropic on the transformer circuits agenda, and has done independent work on reverse-engineering modular addition and using this to understand grokking.
1 view
0
0
3 weeks ago 00:03:50 1
Vana - BEG! (Official Music Video)
4 weeks ago 00:09:01 1
КАК СДЕЛАТЬ ТРЕК В ЖАНРЕ СКРИМО // МИДВЕСТ ЭМО
1 month ago 00:38:19 1
L’horreur existentielle de l’usine à trombones.
1 month ago 00:10:49 1
ISRAEL’S SHOCKING New Apartheid Plans For Gaza!
1 month ago 00:46:53 1
Bertrand SCHOLLER : Quelle Chance que Poutine existe (et d’autres), face aux adorateurs du diable
1 month ago 00:03:35 1
Sen Trope - AZIS & Iam Lumoss ’ TikTok ’ Remix 2024
1 month ago 00:05:13 1
Lost Frequencies - Are you with me (2000s) by Enem96 Remix + LIRYCS
1 month ago 00:16:01 1
🚨BREAKING: CAUGHT ON TAPE! “Line Up Trump Voters, Shoot Them“ - Prof’s SICK Classroom Rant Exposed!