Papers

See AI2's Award Winning Papers

Learn more about AI2's Lasting Impact Award

Viewing 31-40 of 111 papers

ManipulaTHOR: A Framework for Visual Object Manipulation
Kiana Ehsani, Winson Han, Alvaro Herrasti, Eli VanderBilt, Luca Weihs, Eric Kolve, Aniruddha Kembhavi, R. MottaghiarXiv • 2021 The domain of Embodied AI has recently witnessed substantial progress, particularly in navigating agents within their environments. These early successes have laid the building blocks for the community to tackle tasks that require agents to actively interact…
Contrasting Contrastive Self-Supervised Representation Learning Pipelines
Klemen Kotar, Gabriel Ilharco, Ludwig Schmidt, Kiana Ehsani, R. MottaghiIEEE/CVF International Conference on Computer Vision (ICCV) • 2021 In the past few years, we have witnessed remarkable breakthroughs in self-supervised representation learning. Despite the success and adoption of representations learned through this paradigm, much is yet to be understood about how different training methods…
GridToPix: Training Embodied Agents with Minimal Supervision
Unnat Jain, Iou-Jen Liu, S. Lazebnik, Aniruddha Kembhavi, Luca Weihs, A. SchwingICCV • 2021 While deep reinforcement learning (RL) promises freedom from hand-labeled data, great successes, especially for Embodied AI, require significant work to create supervision via carefully shaped rewards. Indeed, without shaped rewards, i.e., with only terminal…
Learning Curves for Analysis of Deep Networks
Derek Hoiem, Tanmay Gupta, Zhizhong Li, Michal Shlapentokh-Rothman arXiv • 2021 A learning curve models a classifier's test error as a function of the number of training samples. Prior works show that learning curves can be used to select model parameters and extrapolate performance. We investigate how to use learning curves to analyze…
Visual Semantic Role Labeling for Video Understanding
Arka Sadhu, Tanmay Gupta, Mark Yatskar, R. Nevatia, Aniruddha Kembhavi CVPR • 2021 We propose a new framework for understanding and representing related salient events in a video using visual semantic role labeling. We represent videos as a set of related events, wherein each event consists of a verb and multiple entities that fulfill…
Visual Room Rearrangement
Luca Weihs, Matt Deitke, Aniruddha Kembhavi, R. MottaghiarXiv • 2021 There has been a significant recent progress in the field of Embodied AI with researchers developing models and algorithms enabling embodied agents to navigate and interact within completely unseen environments. In this paper, we propose a new dataset and…
What Can You Learn from Your Muscles? Learning Visual Representation from Human Interactions
Kiana Ehsani, Daniel Gordon, T. Nguyen, R. Mottaghi, A. FarhadiICLR • 2021 Learning effective representations of visual data that generalize to a variety of downstream tasks has been a long quest for computer vision. Most representation learning approaches rely solely on visual data such as images or videos. In this paper, we…
Learning Generalizable Visual Representations via Interactive Gameplay
Luca Weihs, Aniruddha Kembhavi, Kiana Ehsani, Sarah M Pratt, Winson Han, Alvaro Herrasti, Eric Kolve, Dustin Schwenk, R. Mottaghi, A. FarhadiICLR • 2021 A growing body of research suggests that embodied gameplay, prevalent not just in human cultures but across a variety of animal species including turtles and ravens, is critical in developing the neural flexibility for creative problem solving, decision…
Learning About Objects by Learning to Interact with Them
Martin Lohmann, Jordi Salvador, Aniruddha Kembhavi, Roozbeh Mottaghi NeurIPS • 2020 Much of the remarkable progress in computer vision has been focused around fully supervised learning mechanisms relying on highly curated datasets for a variety of tasks. In contrast, humans often learn about their world with little to no external supervision…
X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers
Jaemin Cho, Jiasen Lu, Dustin Schwenk, Hannaneh Hajishirzi, and Aniruddha KembhaviEMNLP • 2020 Mirroring the success of masked language models, vision-and-language counterparts like VILBERT, LXMERT and UNITER have achieved state of the art performance on a variety of multimodal discriminative tasks like visual question answering and visual grounding…

1
•••
3
4
5
•••
12

Natural Language Processing

Computer Vision

AI for the Environment

Experimentation and Communication

Research

Research

Papers

ManipulaTHOR: A Framework for Visual Object Manipulation

Contrasting Contrastive Self-Supervised Representation Learning Pipelines

GridToPix: Training Embodied Agents with Minimal Supervision

Learning Curves for Analysis of Deep Networks

Visual Semantic Role Labeling for Video Understanding

Visual Room Rearrangement

What Can You Learn from Your Muscles? Learning Visual Representation from Human Interactions

Learning Generalizable Visual Representations via Interactive Gameplay

Learning About Objects by Learning to Interact with Them

X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers