VideoNavQA: Bridging the Gap between Visual and Embodied Question
  Answering

Cangea, Cătălina; Belilovsky, Eugene; Liò, Pietro; Courville, Aaron

VideoNavQA: Bridging the Gap between Visual and Embodied Question Answering

Accepted version

Peer-reviewed

Repository URI

https://www.repository.cam.ac.uk/handle/1810/297408

Repository DOI

https://doi.org/10.17863/CAM.44469

Files

Accepted version (417.5 KB)

Type

Conference Object

Authors

Cangea, Cătălina

Belilovsky, Eugene

Liò, Pietro

Courville, Aaron

Abstract

Embodied Question Answering (EQA) is a recently proposed task, where an agent is placed in a rich 3D environment and must act based solely on its egocentric input to answer a given question. The desired outcome is that the agent learns to combine capabilities such as scene understanding, navigation and language understanding in order to perform complex reasoning in the visual world. However, initial advancements combining standard vision and language methods with imitation and reinforcement learning algorithms have shown EQA might be too complex and challenging for these techniques. In order to investigate the feasibility of EQA-type tasks, we build the VideoNavQA dataset that contains pairs of questions and videos generated in the House3D environment. The goal of this dataset is to assess question-answering performance from nearly-ideal navigation paths, while considering a much more complete variety of questions than current instantiations of the EQA task. We investigate several models, adapted from popular VQA methods, on this new benchmark. This establishes an initial understanding of how well VQA-style methods can perform within this novel EQA paradigm.

Keywords

cs.CV, cs.CV, cs.AI, cs.CL, cs.LG

Journal Title

CoRR

Conference Name

The British Machine Vision Conference (BMVC) 2019

Publisher DOI

https://doi.org/10.17863/CAM.44469

Rights

http://www.rioxx.net/licenses/all-rights-reserved

Sponsorship

CC is funded by DREAM CDT and was supported by Mila during the time in Montréal. EB is funded by IVADO. We also thank the University of Cambridge Research Computing Services for providing HPC cluster resources.

Collections

Cambridge University Research Outputs