Research - Papers
Explore a selection of our published work on a variety of key research challenges in AI.
Grounded Situation Recognition
We introduce Grounded Situation Recognition (GSR), a task that requires producing structured semantic summaries of images describing: the primary activity, entities engaged in the activity with…
Spatially Aware Multimodal Transformers for TextVQA
Textual cues are essential for everyday tasks like buying groceries and using public transport. To develop this assistive technology, we study the TextVQA task, i.e., reasoning about text in images…
VisualCOMET: Reasoning About the Dynamic Context of a Still Image
Even from a single frame of a still image, people can reason about the dynamic story of the image before, after, and beyond the frame. For example, given an image of a man struggling to stay afloat…
ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks
We present ALFRED (Action Learning From Realistic Environments and Directives), a benchmark for learning a mapping from natural language instructions and egocentric vision to sequences of actions…
Butterfly Transform: An Efficient FFT Based Neural Architecture Design
In this paper, we introduce the Butterfly Transform (BFT), a light weight channel fusion method that reduces the computational complexity of point-wise convolutions from O(n^2) of conventional…
RoboTHOR: An Open Simulation-to-Real Embodied AI Platform
Visual recognition ecosystems (e.g. ImageNet, Pascal, COCO) have undeniably played a prevailing role in the evolution of modern computer vision. We argue that interactive and embodied visual AI has…
Use the Force, Luke! Learning to Predict Physical Forces by Simulating Effects
When we humans look at a video of human-object interaction, we can not only infer what is happening but we can even extract actionable information and imitate those interactions. On the other hand,…
Visual Reaction: Learning to Play Catch with Your Drone
In this paper we address the problem of visual reaction: the task of interacting with dynamic environments where the changes in the environment are not necessarily caused by the agents itself.…
What's Hidden in a Randomly Weighted Neural Network?
Training a neural network is synonymous with learning the values of the weights. In contrast, we demonstrate that randomly weighted neural networks contain subnetworks which achieve impressive…
Soft Threshold Weight Reparameterization for Learnable Sparsity
Sparsity in Deep Neural Networks (DNNs) is studied extensively with the focus of maximizing prediction accuracy given an overall parameter budget. Existing methods rely on uniform or heuristic…