Skip to main content ->
Ai2

Research - Papers

Explore a selection of our published work on a variety of key research challenges in AI.

Filter papers

Think you have Solved Direct-Answer Question Answering? Try ARC-DA, the Direct-Answer AI2 Reasoning Challenge

Sumithra BhakthavatsalamDaniel KhashabiTushar KhotP. Clark
2021
arXiv

We present the ARC-DA dataset, a direct-answer (“open response”, “freeform”) version of the ARC (AI2 Reasoning Challenge) multiple-choice dataset. While ARC has been influential in the community,… 

GENIE: A Leaderboard for Human-in-the-Loop Evaluation of Text Generation

Daniel KhashabiGabriel StanovskyJonathan BraggDaniel S. Weld
2021
arXiv

Leaderboards have eased model development for many NLP datasets by standardizing their evaluation and delegating it to an independent external repository. Their adoption, however, is so far limited… 

CLUE: A Chinese Language Understanding Evaluation Benchmark

L. XuX.ZhangL. Liet.al.
2020
COLING

We introduce CLUE, a Chinese Language Understanding Evaluation benchmark. It contains eight different tasks, including single-sentence classification, sentence pair classification, and machine… 

Belief Propagation Neural Networks

J. KuckShuvam ChakrabortyHao TangS. Ermon
2020
NeurIPS

Learned neural solvers have successfully been used to solve combinatorial optimization and decision problems. More general counting variants of these problems, however, are still largely solved with… 

Leap-Of-Thought: Teaching Pre-Trained Models to Systematically Reason Over Implicit Knowledge

Alon TalmorOyvind TafjordPeter ClarkJonathan Berant
2020
NeurIPS • Spotlight Presentation

To what extent can a neural network systematically reason over symbolic facts? Evidence suggests that large pre-trained language models (LMs) acquire some reasoning capacity, but this ability is… 

From 'F' to 'A' on the N.Y. Regents Science Exams: An Overview of the Aristo Project

Peter ClarkOren EtzioniDaniel KhashabiMichael Schmitz
2020
AI Magazine

AI has achieved remarkable mastery over games such as Chess, Go, and Poker, and even Jeopardy!, but the rich variety of standardized exams has remained a landmark challenge. Even in 2016, the best… 

Neural Natural Language Inference Models Partially Embed Theories of Lexical Entailment and Negation

Atticus GeigerKyle RichardsonChristopher Potts
2020
EMNLP • BlackboxNLP Workshop

We address whether neural models for Natural Language Inference (NLI) can learn the compositional interactions between lexical entailment and negation, using four methods: the behavioral evaluation… 

A Dataset for Tracking Entities in Open Domain Procedural Text

Niket TandonKeisuke SakaguchiBhavana Dalvi MishraEduard Hovy
2020
EMNLP

We present the first dataset for tracking state changes in procedural text from arbitrary domains by using an unrestricted (open) vocabulary. For example, in a text describing fog removal using… 

A Simple Yet Strong Pipeline for HotpotQA

Dirk GroeneveldTushar KhotMausamAshish Sabharwal
2020
EMNLP

State-of-the-art models for multi-hop question answering typically augment large-scale language models like BERT with additional, intuitively useful capabilities such as named entity recognition,… 

IIRC: A Dataset of Incomplete Information Reading Comprehension Questions

James FergusonMatt Gardner. Hannaneh HajishirziTushar KhotPradeep Dasigi
2020
EMNLP

Humans often have to read multiple documents to address their information needs. However, most existing reading comprehension (RC) tasks only focus on questions for which the contexts provide all…