Research - Papers
Explore a selection of our published work on a variety of key research challenges in AI.
Visual Semantic Navigation using Scene Priors
How do humans navigate to target objects in novel scenes? Do we use the semantic/functional priors we have built over years to efficiently search and navigate? For example, to search for mugs, we…
Actor and Observer: Joint Modeling of First and Third-Person Videos
Several theories in cognitive neuroscience suggest that when people interact with the world, or simulate interactions, they do so from a first-person egocentric perspective, and seamlessly transfer…
Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
A number of studies have found that today’s Visual Question Answering (VQA) models are heavily driven by superficial correlations in the training data and lack sufficient image grounding. To…
ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation
We introduce a fast and efficient convolutional neural network, ESPNet, for semantic segmentation of high resolution images under resource constraints. ESPNet is based on a new convolutional module,…
Imagine This! Scripts to Compositions to Videos
Imagining a scene described in natural language with realistic layout and appearance of entities is the ultimate test of spatial, visual, and semantic world knowledge. Towards this goal, we present…
IQA: Visual Question Answering in Interactive Environments
We introduce Interactive Question Answering (IQA), the task of answering questions that require an autonomous agent to interact with a dynamic visual environment. IQA presents the agent with a scene…
Transferring Common-Sense Knowledge for Object Detection
We propose the idea of transferring common-sense knowledge from source categories to target categories for scalable object detection. In our setting, the training data for the source categories have…
Neural Motifs: Scene Graph Parsing with Global Context
We investigate the problem of producing structured graph representations of visual scenes. Our work analyzes the role of motifs: regularly appearing substructures in scene graphs. We present new…
SeGAN: Segmenting and Generating the Invisible
Objects often occlude each other in scenes; Inferring their appearance beyond their visible parts plays an important role in scene understanding, depth estimation, object interaction and…
Structured Set Matching Networks for One-Shot Part Labeling
Diagrams often depict complex phenomena and serve as a good test bed for visual and textual reasoning. However, understanding diagrams using natural image understanding approaches requires large…