Research - Papers
Explore a selection of our published work on a variety of key research challenges in AI.
Actor and Observer: Joint Modeling of First and Third-Person Videos
Several theories in cognitive neuroscience suggest that when people interact with the world, or simulate interactions, they do so from a first-person egocentric perspective, and seamlessly transfer…
Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
A number of studies have found that today’s Visual Question Answering (VQA) models are heavily driven by superficial correlations in the training data and lack sufficient image grounding. To…
ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation
We introduce a fast and efficient convolutional neural network, ESPNet, for semantic segmentation of high resolution images under resource constraints. ESPNet is based on a new convolutional module,…
Imagine This! Scripts to Compositions to Videos
Imagining a scene described in natural language with realistic layout and appearance of entities is the ultimate test of spatial, visual, and semantic world knowledge. Towards this goal, we present…
IQA: Visual Question Answering in Interactive Environments
We introduce Interactive Question Answering (IQA), the task of answering questions that require an autonomous agent to interact with a dynamic visual environment. IQA presents the agent with a scene…
Transferring Common-Sense Knowledge for Object Detection
We propose the idea of transferring common-sense knowledge from source categories to target categories for scalable object detection. In our setting, the training data for the source categories have…
Neural Motifs: Scene Graph Parsing with Global Context
We investigate the problem of producing structured graph representations of visual scenes. Our work analyzes the role of motifs: regularly appearing substructures in scene graphs. We present new…
SeGAN: Segmenting and Generating the Invisible
Objects often occlude each other in scenes; Inferring their appearance beyond their visible parts plays an important role in scene understanding, depth estimation, object interaction and…
Structured Set Matching Networks for One-Shot Part Labeling
Diagrams often depict complex phenomena and serve as a good test bed for visual and textual reasoning. However, understanding diagrams using natural image understanding approaches requires large…
Who Let The Dogs Out? Modeling Dog Behavior From Visual Data
We study the task of directly modelling a visually intelligent agent. Computer vision typically focuses on solving various subtasks related to visual intelligence. We depart from this standard…