Multi-modal Deep Learning for Complex Document Understanding with Doug Burdick - #541
Today we’re joined by Doug Burdick, a principal research staff member at IBM Research. In a recent interview, Doug’s colleague Yunyao Li joined us to talk through some of the broader enterprise NLP problems she’s working on. One of those problems is making documents machine consumable, especially with the traditionally archival file type, the PDF. That’s where Doug and his team come in. In our conversation, we discuss the multimodal approach they’ve taken to identify, interpret, contextualize and extract things like tables from a document, the challenges they’ve faced when dealing with the tables and how they evaluate the performance of models on tables. We also explore how he’s handled generalizing across different formats, how fine-tuning has to be in order to be effective, the problems that appear on the NLP side of things, and how deep learning models are being leveraged within the group.
The complete show notes for this episode can be found at
Subscribe:
A
8 views
79
27
4 months ago 00:00:28 28
Twangström: Spring Reverberator - Out Now!
9 months ago 00:00:15 2
Google DeepMind | Visualising AI - Multimodality
10 months ago 01:00:33 1
Introducing Axon-R: Wearable BCI Platform with AR & AI
11 months ago 00:07:59 10
DeepMind’s New AI: Assistant From The Future!
11 months ago 00:05:12 1
Edrix Puzzle_Deep in Dione_
1 year ago 00:08:34 1
DeepMind Gemini 1.5 - An AI That Remembers!
1 year ago 00:30:43 1
Googles GEMINI Just SHOCKED The ENTIRE INDUSTRY! (GPT-4 Beaten) Full Breakdown + Technical Report
1 year ago 00:17:21 1
New AI Breakthroughs Explained. It’s ALL Accelerating!
1 year ago 00:17:32 1
10 years of NLP history explained in 50 concepts | From Word2Vec, RNNs to GPT
1 year ago 00:11:59 1
UVI UVX80 | Overview
1 year ago 00:01:01 8
UVI UVX80 | Trailer
1 year ago 00:43:00 1
The Mastermind Behind GPT-4 and the Future of AI | Ilya Sutskever | Eye on AI #118
1 year ago 00:29:45 41
GForce | Oberheim SEM | Presets Preview (No Talk)
1 year ago 00:07:03 1
Google RT-X Series: The Next-Gen Smart Robot! 🤩 (FIRST LOOK)
1 year ago 01:54:41 1
Vinyl Only Deep Chicago House Lounge Mix - November 2016 by Rafael Silesia
2 years ago 01:01:43 1
Methylene Blue: Part 2 with Dr. Francisco Gonzalez-Lima
2 years ago 00:08:25 1
GPT5 Next Gen : 7 Upcoming Abilities To Transform AI + The Future of Tech | OpenAI
2 years ago 00:09:22 5
Deep Floyd IF на бесплатном Colab - Пошаговое преобразование текста в изображение с помощью искусственного интеллекта
2 years ago 00:08:08 1
Next Gen Robots: NEW AI Unlocks 5 Key Abilities & SHOCKS Entire Industry | ConceptFusion + Runway
2 years ago 00:16:10 2
Ferragamo | Spring Summer 2023 | Full Show
2 years ago 00:08:01 1
Google AGI ?? NEW Multimodal AI (Text Visual Robotics) + 562,000,000,000 Parameters | PaLM-E
2 years ago 00:06:28 9
DeepMind’s New AI Surpasses Humans At Some Things!
3 years ago 04:11:23 10
Deep Learning for Multi-Modal Systems | Data Science Summer School 2022
3 years ago 00:58:59 5
CS25 I Stanford Seminar - DeepMind’s Perceiver and Perceiver IO: new data family architecture