Research - Papers
Explore a selection of our published work on a variety of key research challenges in AI.
Who Let The Dogs Out? Modeling Dog Behavior From Visual Data
We study the task of directly modelling a visually intelligent agent. Computer vision typically focuses on solving various subtasks related to visual intelligence. We depart from this standard…
AI2-THOR: An Interactive 3D Environment for Visual AI
We introduce The House Of inteRactions (THOR), a framework for visual AI research, available at this http URL AI2-THOR consists of near photo-realistic 3D indoor scenes, where AI agents can navigate…
Are You Smarter Than A Sixth Grader? Textbook Question Answering for Multimodal Machine Comprehension
We introduce the task of Multi-Modal Machine Comprehension (M3C), which aims at answering multimodal questions given a context of text, diagrams and images. We present the Textbook Question…
Asynchronous Temporal Fields for Action Recognition
Actions are more than just movements and trajectories: we cook to eat and we hold a cup to drink from it. A thorough understanding of videos requires going beyond appearance modeling and…
Bidirectional Attention Flow for Machine Comprehension
Machine comprehension (MC), answering a query about a given context paragraph, requires modeling complex interactions between the context and the query. Recently, attention mechanisms have been…
Commonly Uncommon: Semantic Sparsity in Situation Recognition
Semantic sparsity is a common challenge in structured visual classification problems; when the output space is complex, the vast majority of the possible predictions are rarely, if ever, seen in the…
LCNN: Lookup-based Convolutional Neural Network
Porting state of the art deep learning algorithms to resource constrained compute platforms (e.g. VR, AR, wearables) is extremely challenging. We propose a fast, compact, and accurate model for…
Query-Reduction Networks for Question Answering
In this paper, we study the problem of question answering when reasoning over multiple facts is required. We propose Query-Reduction Network (QRN), a variant of Recurrent Neural Network (RNN) that…
See the Glass Half Full: Reasoning about Liquid Containers, their Volume and Content
Humans have rich understanding of liquid containers and their contents; for example, we can effortlessly pour water from a pitcher to a cup. Doing so requires estimating the volume of the cup,…
Target-driven visual navigation in indoor scenes using deep reinforcement learning
Two less addressed issues of deep reinforcement learning are (1) lack of generalization capability to new goals, and (2) data inefficiency, i.e., the model requires several (and often costly)…