Papers

Learn more about AI2's Lasting Impact Award
Viewing 31-40 of 111 papers
  • ManipulaTHOR: A Framework for Visual Object Manipulation

    Kiana Ehsani, Winson Han, Alvaro Herrasti, Eli VanderBilt, Luca Weihs, Eric Kolve, Aniruddha Kembhavi, R. MottaghiarXiv2021 The domain of Embodied AI has recently witnessed substantial progress, particularly in navigating agents within their environments. These early successes have laid the building blocks for the community to tackle tasks that require agents to actively interact…
  • Contrasting Contrastive Self-Supervised Representation Learning Pipelines

    Klemen Kotar, Gabriel Ilharco, Ludwig Schmidt, Kiana Ehsani, R. MottaghiIEEE/CVF International Conference on Computer Vision (ICCV)2021 In the past few years, we have witnessed remarkable breakthroughs in self-supervised representation learning. Despite the success and adoption of representations learned through this paradigm, much is yet to be understood about how different training methods…
  • GridToPix: Training Embodied Agents with Minimal Supervision

    Unnat Jain, Iou-Jen Liu, S. Lazebnik, Aniruddha Kembhavi, Luca Weihs, A. SchwingICCV2021 While deep reinforcement learning (RL) promises freedom from hand-labeled data, great successes, especially for Embodied AI, require significant work to create supervision via carefully shaped rewards. Indeed, without shaped rewards, i.e., with only terminal…
  • Learning Curves for Analysis of Deep Networks

    Derek Hoiem, Tanmay Gupta, Zhizhong Li, Michal Shlapentokh-Rothman arXiv2021 A learning curve models a classifier's test error as a function of the number of training samples. Prior works show that learning curves can be used to select model parameters and extrapolate performance. We investigate how to use learning curves to analyze…
  • Visual Semantic Role Labeling for Video Understanding

    Arka Sadhu, Tanmay Gupta, Mark Yatskar, R. Nevatia, Aniruddha Kembhavi CVPR2021 We propose a new framework for understanding and representing related salient events in a video using visual semantic role labeling. We represent videos as a set of related events, wherein each event consists of a verb and multiple entities that fulfill…
  • Visual Room Rearrangement

    Luca Weihs, Matt Deitke, Aniruddha Kembhavi, R. MottaghiarXiv2021 There has been a significant recent progress in the field of Embodied AI with researchers developing models and algorithms enabling embodied agents to navigate and interact within completely unseen environments. In this paper, we propose a new dataset and…
  • What Can You Learn from Your Muscles? Learning Visual Representation from Human Interactions

    Kiana Ehsani, Daniel Gordon, T. Nguyen, R. Mottaghi, A. FarhadiICLR2021 Learning effective representations of visual data that generalize to a variety of downstream tasks has been a long quest for computer vision. Most representation learning approaches rely solely on visual data such as images or videos. In this paper, we…
  • Learning Generalizable Visual Representations via Interactive Gameplay

    Luca Weihs, Aniruddha Kembhavi, Kiana Ehsani, Sarah M Pratt, Winson Han, Alvaro Herrasti, Eric Kolve, Dustin Schwenk, R. Mottaghi, A. FarhadiICLR2021 A growing body of research suggests that embodied gameplay, prevalent not just in human cultures but across a variety of animal species including turtles and ravens, is critical in developing the neural flexibility for creative problem solving, decision…
  • Learning About Objects by Learning to Interact with Them

    Martin Lohmann, Jordi Salvador, Aniruddha Kembhavi, Roozbeh Mottaghi NeurIPS2020 Much of the remarkable progress in computer vision has been focused around fully supervised learning mechanisms relying on highly curated datasets for a variety of tasks. In contrast, humans often learn about their world with little to no external supervision…
  • X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers

    Jaemin Cho, Jiasen Lu, Dustin Schwenk, Hannaneh Hajishirzi, and Aniruddha KembhaviEMNLP2020 Mirroring the success of masked language models, vision-and-language counterparts like VILBERT, LXMERT and UNITER have achieved state of the art performance on a variety of multimodal discriminative tasks like visual question answering and visual grounding…