Skip to main content

Visual Question Answering: Progress and Challenges

Aishwarya Agrawal ( Google DeepMind )

In this talk, I will present our work on a multi-modal AI task called Visual Question Answering (VQA) -- given an image and a natural language question about the image (e.g., “What kind of store is this?”, “Is it safe to cross the street?”), the machine’s task is to automatically produce an accurate natural language answer (“bakery”, “yes”). Applications of VQA include -- aiding visually impaired users in understanding their surroundings, aiding analysts in examining large quantities of surveillance data, teaching children through interactive demos, interacting with personal AI assistants, and making visual social media content more accessible. Specifically, I will provide a brief overview of the VQA task, dataset and baseline models, highlight the problem of visual grounding in existing VQA models, and talk about how to fix it by proposing -- 1) a new evaluation protocol,  2) a new model architecture, and 3) a novel objective function. Towards the end of the talk, I will talk about the challenges in VQA that we are yet to address, in spite of the tremendous amount of progress over the last few years.

 

Speaker bio

Aishwarya Agrawal is a Research Scientist at DeepMind (London office). She completed her PhD from Georgia Tech, working with Dhruv Batra and Devi Parikh. Her research interests lie at the intersection of computer vision, deep learning and natural language processing. The Visual Question Answering (VQA) work by Aishwarya and her colleagues has witnessed tremendous interest in a short period of time.

Aishwarya is a recipient of the 2019 Google Fellowship (declined), Facebook Fellowship 2019-2020 (declined) and NVIDIA Graduate Fellowship 2018-2019. Aishwarya was selected for the Rising Stars in EECS 2018. She was also a finalist of the Foley Scholars Award 2018 and Microsoft and Adobe Research Fellowships 2017-2018. As a research intern Aishwarya has spent time at DeepMind, Microsoft Research and Allen Institute for Artificial Intelligence.

Aishwarya led the organization of the first VQA challenge and workshop at CVPR 2016 and co-organized the second, third and fourth VQA challenges and workshops at CVPR 2017, 2018 and 2019. As a reviewer, she has served on the program committee of various conferences (CVPR, ICCV, ECCV, NIPS, ICLR) and a journal (IJCV). She was awarded an Outstanding Reviewer award twice (NIPS 2017 and CVPR 2017).

Aishwarya received her bachelor's degree in Electrical Engineering with a minor in Computer Science and Engineering from Indian Institute of Technology (IIT) Gandhinagar in 2014.

For more info: https://www.cc.gatech.edu/~aagrawal307/

 

 

Share this: