Research - Papers
Explore a selection of our published work on a variety of key research challenges in AI.
Efficient Adaptation of Pretrained Transformers for Abstractive Summarization
Large-scale learning of transformer language models has yielded improvements on a variety of natural language understanding tasks. Whether they can be effectively adapted for summarization, however,…
Assisted Excitation of Activations: A Learning Technique to Improve Object Detectors
We present a simple and effective learning technique that significantly improves mAP of YOLO object detectors without compromising their speed. During network training, we carefully feed in…
ELASTIC: Improving CNNs by Instance Specific Scaling Policies
Scale variation has been a challenge from traditional to modern approaches in computer vision. Most solutions to scale issues have similar theme: a set of intuitive and manually designed policies…
ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network
We introduce a light-weight, power efficient, and general purpose convolutional neural network, ESPNetv2 , for modeling visual and sequential data. Our network uses group point-wise and depth-wise…
From Recognition to Cognition: Visual Commonsense Reasoning
Visual understanding goes well beyond object recognition. With one glance at an image, we can effortlessly imagine the world beyond the pixels: for instance, we can infer people’s actions, goals,…
Learning to Learn How to Learn: Self-Adaptive Visual Navigation Using Meta-Learning
Learning is an inherently continuous phenomenon. When humans learn a new task there is no explicit distinction between training and inference. After we learn a task, we keep learning about it while…
OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge
Visual Question Answering (VQA) in its ideal form lets us study reasoning in the joint space of vision and language and serves as a proxy for the AI task of scene understanding. However, most VQA…
Two Body Problem: Collaborative Visual Task Completion
Collaboration is a necessary skill to perform tasks that are beyond one agent's capabilities. Addressed extensively in both conventional and modern AI, multi-agent collaboration has often been…
Video Relationship Reasoning using Gated Spatio-Temporal Energy Graph
Visual relationship reasoning is a crucial yet challenging task for understanding rich interactions across visual concepts. For example, a relationship \{man, open, door\} involves a complex…
Barack's Wife Hillary: Using Knowledge Graphs for Fact-Aware Language Modeling
Modeling human language requires the ability to not only generate fluent text but also encode factual knowledge. However, traditional language models are only capable of remembering facts seen at…