Research - Papers
Explore a selection of our published work on a variety of key research challenges in AI.
SugarCrepe: Fixing Hackable Benchmarks for Vision-Language Compositionality
In the last year alone, a surge of new benchmarks to measure compositional understanding of vision-language models have permeated the machine learning ecosystem. Given an image, these benchmarks…
SPOC: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World
Reinforcement learning (RL) with dense rewards and imitation learning (IL) with human-generated trajectories are the most widely used approaches for training modern embodied agents. RL requires…
SatlasPretrain: A Large-Scale Dataset for Remote Sensing Image Understanding
Remote sensing images are useful for a wide variety of planet monitoring applications, from tracking deforestation to tackling illegal fishing. The Earth is extremely diverse -- the amount of…
Objaverse-XL: A Universe of 10M+ 3D Objects
Natural language processing and 2D vision models have attained remarkable proficiency on many tasks primarily by escalating the scale of training data. However, 3D vision tasks have not seen the…
Phone2Proc: Bringing Robust Robots Into Our Chaotic World
Training embodied agents in simulation has become mainstream for the embodied AI community. However, these agents often struggle when deployed in the physical world due to their inability to…
Visual Programming: Compositional visual reasoning without training
We present VISPROG, a neuro-symbolic approach to solving complex and compositional visual tasks given natural language instructions. VISPROG avoids the need for any task-specific training. Instead,…
EXCALIBUR: Encouraging and Evaluating Embodied Exploration
Experience precedes understanding. Humans constantly explore and learn about their environment out of curiosity, gather information, and update their models of the world. On the other hand,…
Is Reinforcement Learning (Not) for Natural Language Processing?: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization
We tackle the problem of aligning pre-trained large language models (LMs) with human preferences. If we view text generation as a sequential decision-making problem, reinforcement learning (RL)…
Moving Forward by Moving Backward: Embedding Action Impact over Action Semantics
A common assumption when training embodied agents is that the impact of taking an action is stable; for instance, executing the"move ahead"action will always move the agent forward by a fixed…
When Learning Is Out of Reach, Reset: Generalization in Autonomous Visuomotor Reinforcement Learning
Episodic training, where an agent's environment is reset to some initial condition after every success or failure, is the de facto standard when training embodied reinforcement learning (RL) agents.…