Research - Papers

Explore a selection of our published work on a variety of key research challenges in AI.

Objaverse: A Universe of Annotated 3D Objects

Matt DeitkeDustin SchwenkJordi SalvadorAli Farhadi

2022

CVPR

Massive data corpora like WebText, Wikipedia, Conceptual Captions, WebImageText, and LAION have propelled recent dramatic progress in AI. Large neural models trained on such datasets produce…

Ask4Help: Learning to Leverage an Expert for Embodied Tasks

Kunal Pratap SinghLuca WeihsAlvaro HerrastiRoozbeh Mottaghi

2022

arXiv

Embodied AI agents continue to become more capable every year with the advent of new models, environments, and benchmarks, but are still far away from being performant and reliable enough to be…

ProcTHOR: Large-Scale Embodied AI Using Procedural Generation

Matt DeitkeEli VanderBiltAlvaro HerrastiRoozbeh Mottaghi

2022

NeurIPS

Massive datasets and high-capacity models have driven many recent advancements in computer vision and natural language understanding. This work presents a platform to enable similar success stories…

Webly Supervised Concept Expansion for General Purpose Vision Models

Amita KamathChristopher ClarkTanmay GuptaAniruddha Kembhavi

2022

ECCV

General purpose vision (GPV) systems [25] are models that are designed to solve a wide array of visual tasks without requiring architectural changes. Today, GPVs primarily learn both skills and…

Towards Disturbance-Free Visual Mobile Manipulation

Tianwei NiKiana EhsaniLuca WeihsJordi Salvador

2022

arXiv

Deep reinforcement learning has shown promising results on an abundance of robotic tasks in simulation, including visual navigation and manipulation. Prior work generally aims to build embodied…

Benchmarking Progress to Infant-Level Physical Reasoning in AI

Luca WeihsAmanda Rose YuileRenée BaillargeonAniruddha Kembhavi

2022

TMLR

To what extent do modern AI systems comprehend the physical world? We introduce the open-access Infant-Level Physical Reasoning Benchmark ( InfLevel ) to gain insight into this question. We evaluate…

I can’t believe there’s no images! : Learning Visual Tasks Using Only Language Supervision

Sophia GuChristopher ClarkAniruddha Kembhavi

2022

ICCV International Conference on Computer Vision

Many high-level skills that are required for computer vision tasks, such as parsing questions, comparing and contrasting semantics, and writing descriptions, are also required in other domains such…

Simple but Effective: CLIP Embeddings for Embodied AI

Apoorv KhandelwalLuca WeihsRoozbeh MottaghiAniruddha Kembhavi

2022

CVPR

Contrastive language image pretraining (CLIP) encoders have been shown to be beneﬁcial for a range of visual tasks from classiﬁcation and detection to caption-ing and image manipulation. We…

MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound

Rowan ZellersJiasen LuXiming LuYejin Choi

2022

CVPR

This task enables it to perform well variety Abstract As humans, we navigate a multimodal world, building a holistic understanding from all our senses. We introduce MERLOT Reserve , a model that…

Towards General Purpose Vision Systems

Tanmay GuptaA. KamathAniruddha KembhaviDerek Hoiem

2022

CVPR

A special purpose learning system assumes knowledge of admissible tasks at design time. Adapting such a system to unforeseen tasks requires architecture manipulation such as adding an output head…

Previous22-31Next