Papers

See AI2's Award Winning Papers

Learn more about AI2's Lasting Impact Award

Viewing 21-30 of 111 papers

Interactron: Embodied Adaptive Object Detection
Klemen Kotar, Roozbeh MottaghiCVPR • 2022 Over the years various methods have been proposed for the problem of object detection. Recently, we have wit-nessed great strides in this domain owing to the emergence of powerful deep neural networks. However, there are typically two main assumptions common…
Multi-Modal Answer Validation for Knowledge-Based VQA
Jialin Wu, Jiasen Lu, Ashish Sabharwal, R. MottaghiAAAI • 2022 The problem of knowledge-based visual question answering involves answering questions that require external knowledge in addition to the content of the image. Such knowledge typically comes in a variety of forms, including visual, textual, and commonsense…
Vessel Detection in Sentinel-1 Imagery
Favyen Bastani, Piper Wolters, Rose Hendrix, Joseph Ferdinando, Ani KembhaviAI2 whitepaper • 2022 In this document, we detail the approach in our xView3 submission. The xView3 dataset presents the challenge of detecting vessels and other maritime objects in synthetic aperture radar (SAR) images captured by the ESA’s Sentinel- 1 satellite. The dataset…
Bridging the Imitation Gap by Adaptive Insubordination
Luca Weihs, Unnat Jain, Jordi Salvador, S. Lazebnik, Aniruddha Kembhavi, A. SchwingarXiv • 2021 Why do agents often obtain better reinforcement learning policies when imitating a worse expert? We show that privileged information used by the expert is marginalized in the learned agent policy, resulting in an "imitation gap." Prior work bridges this gap…
Iconary: A Pictionary-Based Game for Testing Multimodal Communication with Drawings and Text
Christopher Clark, Jordi Salvador, Dustin Schwenk, Derrick Bonafilia, Mark Yatskar, Eric Kolve, Alvaro Herrasti, Jonghyun Choi, Sachin Mehta, Sam Skjonsberg, Carissa Schoenick, A. Sarnat, Hannaneh Hajishirzi, Aniruddha Kembhavi, Oren Etzioni, Ali FarhadiarXiv • 2021 Communicating with humans is challenging for AIs because it requires a shared understanding of the world, complex semantics (e.g., metaphors or analogies), and at times multimodal gestures (e.g., pointing with a finger, or an arrow in a diagram). We…
Container: Context Aggregation Network
Peng Gao, Jiasen Lu, Hongsheng Li, R. Mottaghi, Aniruddha KembhaviarXiv • 2021 Convolutional neural networks (CNNs) are ubiquitous in computer vision, with a myriad of effective and efficient variations. Recently, Transformers – originally introduced in natural language processing – have been increasingly adopted in computer vision…
Factorizing Perception and Policy for Interactive Instruction Following
Kunal Pratap Singh, Suvaansh Bhambri, Byeonghwi Kim, R. Mottaghi, Jonghyun ChoiarXiv • 2021 Performing simple household tasks based on language directives is very natural to humans, yet it remains an open challenge for AI agents. The ‘interactive instruction following’ task attempts to make progress towards building agents that jointly navigate…
PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World
Rowan Zellers, Ari Holtzman, Matthew E. Peters, R. Mottaghi, Aniruddha Kembhavi, A. Farhadi, Yejin ChoiACL • 2021 We propose PIGLeT: a model that learns physical commonsense knowledge through interaction, and then uses this knowledge to ground language. We factorize PIGLeT into a physical dynamics model, and a separate language model. Our dynamics model learns not just…
RobustNav: Towards Benchmarking Robustness in Embodied Navigation
Prithvijit Chattopadhyay, Judy Hoffman, R. Mottaghi, Aniruddha KembhaviarXiv • 2021 As an attempt towards assessing the robustness of embodied navigation agents, we propose ROBUSTNAV, a framework to quantify the performance of embodied navigation agents when exposed to a wide variety of visual – affecting RGB inputs – and dynamics…
Pushing it out of the Way: Interactive Visual Navigation
Kuo-Hao Zeng, Luca Weihs, A. Farhadi, R. MottaghiarXiv • 2021 We have observed significant progress in visual navigation for embodied agents. A common assumption in studying visual navigation is that the environments are static; this is a limiting assumption. Intelligent navigation may involve interacting with the…

1
2
3
4
•••
12

Natural Language Processing

Computer Vision

AI for the Environment

Experimentation and Communication

Research

Research

Papers

Interactron: Embodied Adaptive Object Detection

Multi-Modal Answer Validation for Knowledge-Based VQA

Vessel Detection in Sentinel-1 Imagery

Bridging the Imitation Gap by Adaptive Insubordination

Iconary: A Pictionary-Based Game for Testing Multimodal Communication with Drawings and Text

Container: Context Aggregation Network

Factorizing Perception and Policy for Interactive Instruction Following

PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World

RobustNav: Towards Benchmarking Robustness in Embodied Navigation

Pushing it out of the Way: Interactive Visual Navigation