Papers

See AI2's Award Winning Papers

Learn more about AI2's Lasting Impact Award

Viewing 101-110 of 214 papers

OCNLI: Original Chinese Natural Language Inference
H. Hu, Kyle Richardson, Liang Xu, L. Li, Sandra Kübler, L. MossFindings of EMNLP • 2020 Despite the tremendous recent progress on natural language inference (NLI), driven largely by large-scale investment in new datasets (e.g., SNLI, MNLI) and advances in modeling, most progress has been limited to English due to a lack of reliable datasets for…
UnifiedQA: Crossing Format Boundaries With a Single QA System
Daniel Khashabi, Sewon Min, Tushar Khot, Ashish Sabharwal, Oyvind Tafjord, Peter Clark, Hannaneh HajishirziFindings of EMNLP • 2020 Question answering (QA) tasks have been posed using a variety of formats, such as extractive span selection, multiple choice, etc. This has led to format-specialized models, and even to an implicit division in the QA community. We argue that such boundaries…
UnQovering Stereotyping Biases via Underspecified Questions
Tao Li, Tushar Khot, Daniel Khashabi, Ashish Sabharwal, Vivek SrikumarFindings of EMNLP • 2020 While language embeddings have been shown to have stereotyping biases, how these biases affect downstream question answering (QA) models remains unexplored. We present UNQOVER, a general framework to probe and quantify biases through underspecified questions…
What-if I ask you to explain: Explaining the effects of perturbations in procedural text
Dheeraj Rajagopal, Niket Tandon, Peter Clark, Bhavana Dalvi, Eduard H. HovyFindings of EMNLP • 2020 We address the task of explaining the effects of perturbations in procedural text, an important test of process comprehension. Consider a passage describing a rabbit's life-cycle: humans can easily explain the effect on the rabbit population if a female…
"You are grounded!": Latent Name Artifacts in Pre-trained Language Models
Vered Shwartz, Rachel Rudinger, Oyvind TafjordEMNLP • 2020 Pre-trained language models (LMs) may perpetuate biases originating in their training corpus to downstream models. We focus on artifacts associated with the representation of given names (e.g., Donald), which, depending on the corpus, may be associated with…
Evaluating Models' Local Decision Boundaries via Contrast Sets
M. Gardner, Y. Artzi, V. Basmova, J. Berant, B. Bogin, S. Chen, P. Dasigi, D. Dua, Y. Elazar, A. Gottumukkala, N. Gupta, H. Hajishirzi, G. Ilharco, D.Khashabi, K. Lin, J. Liu, N. F. Liu, P. Mulcaire, Q. Ning, S.Singh, N.A. Smith, S. Subramanian, et alFindings of EMNLP • 2020 Standard test sets for supervised learning evaluate in-distribution generalization. Unfortunately, when a dataset has systematic gaps (e.g., annotation artifacts), these evaluations are misleading: a model can learn simple decision rules that perform well on…
What Does My QA Model Know? Devising Controlled Probes using Expert Knowledge
Kyle Richardson, Ashish SabharwalTACL • 2020 Open-domain question answering (QA) is known to involve several underlying knowledge and reasoning challenges, but are models actually learning such knowledge when trained on benchmark tasks? To investigate this, we introduce several new challenge tasks that…
AdaWISH: Faster Discrete Integration via Adaptive Quantiles
Fan Ding, Hanjing Wang, Ashish Sabharwal, Yexiang XueECAI • 2020 Discrete integration in a high dimensional space of $n$ variables poses fundamental challenges. The WISH algorithm reduces the intractable discrete integration problem into $n$ optimization queries subject to randomized constraints, obtaining a constant…
Approximating the Permanent by Sampling from Adaptive Partitions
Jonathan Kuck, Tri Dao, Hamid Rezatofighi, Ashish Sabharwal, Stefano ErmonUAI • 2020 Computing the permanent of a non-negative matrix is a core problem with practical applications ranging from target tracking to statistical thermodynamics. However, this problem is also #P-complete, which leaves little hope for finding an exact solution that…
Multi-class Hierarchical Question Classification for Multiple Choice Science Exams
Dongfang Xu, Peter Jansen, Jaycie Martin, Zhengnan Xie, Vikas Yadav, Harish Tayyar Madabushi, Oyvind Tafjord, Peter ClarkIJCAI • 2020 Prior work has demonstrated that question classification (QC), recognizing the problem domain of a question, can help answer it more accurately. However, developing strong QC algorithms has been hindered by the limited size and complexity of annotated data…

1
•••
10
11
12
•••
22

Natural Language Processing

Computer Vision

AI for the Environment

Experimentation and Communication

Research

Research

Papers

OCNLI: Original Chinese Natural Language Inference

UnifiedQA: Crossing Format Boundaries With a Single QA System

UnQovering Stereotyping Biases via Underspecified Questions

What-if I ask you to explain: Explaining the effects of perturbations in procedural text

"You are grounded!": Latent Name Artifacts in Pre-trained Language Models

Evaluating Models' Local Decision Boundaries via Contrast Sets

What Does My QA Model Know? Devising Controlled Probes using Expert Knowledge

AdaWISH: Faster Discrete Integration via Adaptive Quantiles

Approximating the Permanent by Sampling from Adaptive Partitions

Multi-class Hierarchical Question Classification for Multiple Choice Science Exams