Papers

Learn more about AI2's Lasting Impact Award
Viewing 21-30 of 111 papers
  • Interactron: Embodied Adaptive Object Detection

    Klemen Kotar, Roozbeh MottaghiCVPR2022 Over the years various methods have been proposed for the problem of object detection. Recently, we have wit-nessed great strides in this domain owing to the emergence of powerful deep neural networks. However, there are typically two main assumptions common…
  • Multi-Modal Answer Validation for Knowledge-Based VQA

    Jialin Wu, Jiasen Lu, Ashish Sabharwal, R. MottaghiAAAI2022 The problem of knowledge-based visual question answering involves answering questions that require external knowledge in addition to the content of the image. Such knowledge typically comes in a variety of forms, including visual, textual, and commonsense…
  • Vessel Detection in Sentinel-1 Imagery

    Favyen Bastani, Piper Wolters, Rose Hendrix, Joseph Ferdinando, Ani KembhaviAI2 whitepaper2022 In this document, we detail the approach in our xView3 submission. The xView3 dataset presents the challenge of detecting vessels and other maritime objects in synthetic aperture radar (SAR) images captured by the ESA’s Sentinel- 1 satellite. The dataset…
  • Bridging the Imitation Gap by Adaptive Insubordination

    Luca Weihs, Unnat Jain, Jordi Salvador, S. Lazebnik, Aniruddha Kembhavi, A. SchwingarXiv2021 Why do agents often obtain better reinforcement learning policies when imitating a worse expert? We show that privileged information used by the expert is marginalized in the learned agent policy, resulting in an "imitation gap." Prior work bridges this gap…
  • Iconary: A Pictionary-Based Game for Testing Multimodal Communication with Drawings and Text

    Christopher Clark, Jordi Salvador, Dustin Schwenk, Derrick Bonafilia, Mark Yatskar, Eric Kolve, Alvaro Herrasti, Jonghyun Choi, Sachin Mehta, Sam Skjonsberg, Carissa Schoenick, A. Sarnat, Hannaneh Hajishirzi, Aniruddha Kembhavi, Oren Etzioni, Ali FarhadiarXiv2021 Communicating with humans is challenging for AIs because it requires a shared understanding of the world, complex semantics (e.g., metaphors or analogies), and at times multimodal gestures (e.g., pointing with a finger, or an arrow in a diagram). We…
  • Container: Context Aggregation Network

    Peng Gao, Jiasen Lu, Hongsheng Li, R. Mottaghi, Aniruddha KembhaviarXiv2021 Convolutional neural networks (CNNs) are ubiquitous in computer vision, with a myriad of effective and efficient variations. Recently, Transformers – originally introduced in natural language processing – have been increasingly adopted in computer vision…
  • Factorizing Perception and Policy for Interactive Instruction Following

    Kunal Pratap Singh, Suvaansh Bhambri, Byeonghwi Kim, R. Mottaghi, Jonghyun ChoiarXiv2021 Performing simple household tasks based on language directives is very natural to humans, yet it remains an open challenge for AI agents. The ‘interactive instruction following’ task attempts to make progress towards building agents that jointly navigate…
  • PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World

    Rowan Zellers, Ari Holtzman, Matthew E. Peters, R. Mottaghi, Aniruddha Kembhavi, A. Farhadi, Yejin ChoiACL2021 We propose PIGLeT: a model that learns physical commonsense knowledge through interaction, and then uses this knowledge to ground language. We factorize PIGLeT into a physical dynamics model, and a separate language model. Our dynamics model learns not just…
  • RobustNav: Towards Benchmarking Robustness in Embodied Navigation

    Prithvijit Chattopadhyay, Judy Hoffman, R. Mottaghi, Aniruddha KembhaviarXiv2021 As an attempt towards assessing the robustness of embodied navigation agents, we propose ROBUSTNAV, a framework to quantify the performance of embodied navigation agents when exposed to a wide variety of visual – affecting RGB inputs – and dynamics…
  • Pushing it out of the Way: Interactive Visual Navigation

    Kuo-Hao Zeng, Luca Weihs, A. Farhadi, R. MottaghiarXiv2021 We have observed significant progress in visual navigation for embodied agents. A common assumption in studying visual navigation is that the environments are static; this is a limiting assumption. Intelligent navigation may involve interacting with the…