∞-former: Infinite Memory Transformer (aka Infty-Former / Infinity-Former, Research Paper Explained)

#inftyformer #infinityformer #transformer Vanilla Transformers are excellent sequence models, but suffer from very harsch constraints on the length of the sequences they can process. Several attempts have been made to extend the Transformer’s sequence length, but few have successfully gone beyond a constant factor improvement. This paper presents a method, based on continuous attention mechanisms, to attend to an unbounded past sequence by representing the past as a continuous signal, rather than a sequence. This enables the Infty-Former to effectively enrich the current context with global information, which increases performance on long-range dependencies in sequence tasks. Further, the paper presents the concept of sticky memories, which highlight past events that are of particular importance and elevates their representation in the long-term memory. OUTLINE: 0:00 - Intro & Overview 1:10 - Sponsor Spot: Weights & Biases 3:35 - Problem Statement 8:00 - Continuous Attent
Back to Top