Research - Papers
Explore a selection of our published work on a variety of key research challenges in AI.
Two Body Problem: Collaborative Visual Task Completion
Collaboration is a necessary skill to perform tasks that are beyond one agent's capabilities. Addressed extensively in both conventional and modern AI, multi-agent collaboration has often been…
ELASTIC: Improving CNNs by Instance Specific Scaling Policies
Scale variation has been a challenge from traditional to modern approaches in computer vision. Most solutions to scale issues have similar theme: a set of intuitive and manually designed policies…
Learning to Learn How to Learn: Self-Adaptive Visual Navigation Using Meta-Learning
Learning is an inherently continuous phenomenon. When humans learn a new task there is no explicit distinction between training and inference. After we learn a task, we keep learning about it while…
OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge
Visual Question Answering (VQA) in its ideal form lets us study reasoning in the joint space of vision and language and serves as a proxy for the AI task of scene understanding. However, most VQA…
ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network
We introduce a light-weight, power efficient, and general purpose convolutional neural network, ESPNetv2 , for modeling visual and sequential data. Our network uses group point-wise and depth-wise…
From Recognition to Cognition: Visual Commonsense Reasoning
Visual understanding goes well beyond object recognition. With one glance at an image, we can effortlessly imagine the world beyond the pixels: for instance, we can infer people’s actions, goals,…
Video Relationship Reasoning using Gated Spatio-Temporal Energy Graph
Visual relationship reasoning is a crucial yet challenging task for understanding rich interactions across visual concepts. For example, a relationship \{man, open, door\} involves a complex…
Assisted Excitation of Activations: A Learning Technique to Improve Object Detectors
We present a simple and effective learning technique that significantly improves mAP of YOLO object detectors without compromising their speed. During network training, we carefully feed in…
Sentence Mover's Similarity: Automatic Evaluation for Multi-Sentence Texts
For evaluating machine-generated texts, automatic methods hold the promise of avoiding collection of human judgments, which can be expensive and time-consuming. The most common automatic metrics,…
Barack's Wife Hillary: Using Knowledge Graphs for Fact-Aware Language Modeling
Modeling human language requires the ability to not only generate fluent text but also encode factual knowledge. However, traditional language models are only capable of remembering facts seen at…