Skip to main content ->
Ai2

Research - Papers

Explore a selection of our published work on a variety of key research challenges in AI.

Filter papers

Simple but Effective: CLIP Embeddings for Embodied AI

Apoorv KhandelwalLuca WeihsRoozbeh MottaghiAniruddha Kembhavi
2022
CVPR

Contrastive language image pretraining (CLIP) encoders have been shown to be beneficial for a range of visual tasks from classification and detection to caption-ing and image manipulation. We… 

Domain Mismatch Doesn’t Always Prevent Cross-Lingual Transfer Learning

Daniel EdmistonPhillip KeungNoah A. Smith
2022
LREC

Cross-lingual transfer learning without labeled target language data or parallel text has been surprisingly effective in zero-shot cross-lingual classification, question answering, unsupervised… 

Towards General Purpose Vision Systems

Tanmay GuptaA. KamathAniruddha KembhaviDerek Hoiem
2022
CVPR

A special purpose learning system assumes knowledge of admissible tasks at design time. Adapting such a system to unforeseen tasks requires architecture manipulation such as adding an output head… 

MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound

Rowan ZellersJiasen LuXiming LuYejin Choi
2022
CVPR

This task enables it to perform well variety Abstract As humans, we navigate a multimodal world, building a holistic understanding from all our senses. We introduce MERLOT Reserve , a model that… 

What do navigation agents learn about their environment?

Kshitij DwivediG. RoigAniruddha KembhaviRoozbeh Mottaghi
2022
arXiv

Today’s state of the art visual navigation agents typically consist of large deep learning models trained end to end. Such models offer little to no interpretability about the learned skills or the… 

Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks

Jiasen LuChristopher ClarkRowan ZellersAniruddha Kembhavi
2022
arXiv

We propose Unified-IO, a model that performs a large variety of AI tasks spanning classical computer vision tasks, including pose estimation, object detection, depth estimation and image generation,… 

DeepA2: A Modular Framework for Deep Argument Analysis with Pretrained Neural Text2Text Language Models

Gregor BetzKyle Richardson
2022
SEM

In this paper, we present and implement a multi-dimensional, modular framework for performing deep argument analysis (DeepA2) using current pre-trained language models (PTLMs). ArgumentAnalyst – a… 

Correcting a coarse-grid climate model in multiple climates by machine learning from global 25-km resolution simulations

Spencer K. ClarkNoah D. BrenowitzBrian HennLucas M. Harris
2022
Earth and Space Science Open Archive

Bretherton et al. (2022, https://doi.org/10.1029/2021MS002794) demonstrated a successful approach for using machine learning (ML) to help a coarse-resolution global atmosphere model with real… 

Multi-Vector Models with Textual Guidance for Fine-Grained Scientific Document Similarity

Sheshera MysoreArman CohanTom Hope
2022
NAACL

We present a new scientific document similarity model based on matching fine-grained aspects of texts. To train our model, we exploit a naturally-occurring source of supervision: sentences in the… 

A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge

Dustin SchwenkApoorv KhandelwalChristopher ClarkRoozbeh Mottaghi
2022
arXiv

The Visual Question Answering (VQA) task aspires to provide a meaningful testbed for the development of AI models that can jointly reason over visual and natural language inputs. Despite a…