Making Transformers go brum, brum, brum 🏎 (with Lewis Tunstall)

Since their introduction in 2017, Transformers have rapidly become the state-of-the-art approach to model tasks in NLP, computer vision, and audio, However, in many situations, accuracy is not enough — your fancy model isn’t very useful if it’s too slow or large to meet the business requirements of your application. In this talk, Lewis will discuss various techniques you can use to optimize Transformer models for production environments. He will cover knowledge distillation and weight quantization, as well as frameworks like ONNX Runtime. This talk is based on a chapter from the upcoming O’“Reilly book on “Natural Language Processing with Transformers” — we’ll be giving away 5 copies of the book as part of this event!

9 views