Skip to main content ->
Ai2

Research - Papers

Explore a selection of our published work on a variety of key research challenges in AI.

Filter papers

Is Multihop QA in DiRe Condition? Measuring and Reducing Disconnected Reasoning

H. TrivediN. BalasubramanianTushar KhotA. Sabharwal
2020
EMNLP

Has there been real progress in multi-hop question-answering? Models often exploit dataset artifacts to produce correct answers, without connecting information across multiple supporting facts. This… 

Learning to Explain: Datasets and Models for Identifying Valid Reasoning Chains in Multihop Question-Answering.

Harsh JhamtaniP. Clark
2020
EMNLP

Despite the rapid progress in multihop question-answering (QA), models still have trouble explaining why an answer is correct, with limited explanation training data available to learn from. To… 

More Bang for Your Buck: Natural Perturbation for Robust Question Answering

Daniel KhashabiTushar KhotAshish Sabharwal
2020
EMNLP

While recent models have achieved human-level scores on many NLP datasets, we observe that they are considerably sensitive to small changes in input. As an alternative to the standard approach of… 

OCNLI: Original Chinese Natural Language Inference

H. HuKyle RichardsonLiang XuL. Moss
2020
Findings of EMNLP

Despite the tremendous recent progress on natural language inference (NLI), driven largely by large-scale investment in new datasets (e.g., SNLI, MNLI) and advances in modeling, most progress has… 

UnifiedQA: Crossing Format Boundaries With a Single QA System

Daniel KhashabiSewon MinTushar KhotHannaneh Hajishirzi
2020
Findings of EMNLP

Question answering (QA) tasks have been posed using a variety of formats, such as extractive span selection, multiple choice, etc. This has led to format-specialized models, and even to an implicit… 

UnQovering Stereotyping Biases via Underspecified Questions

Tao LiTushar KhotDaniel KhashabiVivek Srikumar
2020
Findings of EMNLP

While language embeddings have been shown to have stereotyping biases, how these biases affect downstream question answering (QA) models remains unexplored. We present UNQOVER, a general framework… 

What-if I ask you to explain: Explaining the effects of perturbations in procedural text

Dheeraj RajagopalNiket TandonPeter ClarkEduard H. Hovy
2020
Findings of EMNLP

We address the task of explaining the effects of perturbations in procedural text, an important test of process comprehension. Consider a passage describing a rabbit's life-cycle: humans can easily… 

"You are grounded!": Latent Name Artifacts in Pre-trained Language Models

Vered ShwartzRachel RudingerOyvind Tafjord
2020
EMNLP

Pre-trained language models (LMs) may perpetuate biases originating in their training corpus to downstream models. We focus on artifacts associated with the representation of given names (e.g.,… 

Evaluating Models' Local Decision Boundaries via Contrast Sets

M. GardnerY. ArtziV. Basmovaet al
2020
Findings of EMNLP

Standard test sets for supervised learning evaluate in-distribution generalization. Unfortunately, when a dataset has systematic gaps (e.g., annotation artifacts), these evaluations are misleading:… 

What Does My QA Model Know? Devising Controlled Probes using Expert Knowledge

Kyle RichardsonAshish Sabharwal
2020
TACL

Open-domain question answering (QA) is known to involve several underlying knowledge and reasoning challenges, but are models actually learning such knowledge when trained on benchmark tasks? To…