Research - Papers

Explore a selection of our published work on a variety of key research challenges in AI.

UnifiedQA-v2: Stronger Generalization via Broader Cross-Format Training

Daniel KhashabiYeganeh KordiHannaneh Hajishirzi

2022

arXiv

We present UNIFIEDQA-v2, a QA model built with the same process as UNIFIEDQA, except that it utilizes more supervision – roughly 3× the number of datasets used for UNIFIEDQA. This generally leads to…

Vessel Detection in Sentinel-1 Imagery

Favyen BastaniPiper WoltersRose HendrixAni Kembhavi

2022

AI2 whitepaper

In this document, we detail the approach in our xView3 submission. The xView3 dataset presents the challenge of detecting vessels and other maritime objects in synthetic aperture radar (SAR) images…

Tropical Cirrus in Global Storm‐Resolving Models: 2. Cirrus Life Cycle and Top‐of‐Atmosphere Radiative Fluxes

S. M. TurbevilleJ. M. NugentT. AckermanP. Blossey

2021

Earth and Space Science

Cirrus clouds of various thicknesses and radiative characteristics extend over much of the tropics, especially around deep convection. They are difficult to observe due to their high altitude and…

Tropical Cirrus in Global Storm‐Resolving Models: 1. Role of Deep Convection

J. NugentS. M. TurbevilleC. BrethertonT. Ackerman

2021

Earth and Space Science

Pervasive cirrus clouds in the upper troposphere and tropical tropopause layer (TTL) influence the climate by altering the top‐of‐atmosphere radiation balance and stratospheric water vapor budget.…

DREAM: Improving Situational QA by First Elaborating the Situation

Yuling GuBhavana Dalvi MishraPeter Clark

2021

NAACL

When people answer questions about a specific situation, e.g., "I cheated on my mid-term exam last week. Was that wrong?", cognitive science suggests that they form a mental picture of that…

Inherently Explainable Reinforcement Learning in Natural Language

Xiangyu PengMark O. RiedlPrithviraj Ammanabrolu

2021

arXiv

We focus on the task of creating a reinforcement learning agent that is inherently explainable—with the ability to produce immediate local explanations by thinking out loud while performing a task…

CommonsenseQA 2.0: Exposing the Limits of AI through Gamification

Alon TalmorOri YoranRonan Le BrasJonathan Berant

2021

NeurIPS

Constructing benchmarks that test the abilities of modern natural language un1 derstanding models is difficult – pre-trained language models exploit artifacts in 2 benchmarks to achieve human…

FLEX: Unifying Evaluation for Few-Shot NLP

Jonathan BraggArman CohanKyle LoIz Beltagy

2021

NeurIPS

Few-shot NLP research is highly active, yet conducted in disjoint research threads with evaluation suites that lack challenging-yet-realistic testing setups and fail to employ careful experimental…

MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers

Krishna PillutlaSwabha SwayamdiptaRowan ZellersZ. Harchaoui

2021

NeurIPS

As major progress is made in open-ended text generation, measuring how close machine-generated text is to human language remains a critical open problem. We introduce MAUVE , a comparison measure…

MERLOT: Multimodal Neural Script Knowledge Models

Rowan ZellersXiming LuJack HesselYejin Choi

2021

NeurIPS

As humans, we understand events in the visual world contextually, performing multimodal reasoning across time to make inferences about the past, present, and future. We introduce MERLOT, a model…

Previous512-521Next