Allen Institute for AI


Learn more about AI2's Lasting Impact Award
Viewing 1-10 of 128 papers
  • CLUE: A Chinese Language Understanding Evaluation Benchmark

    L. Xu, X.Zhang, L. Li, H. Hu, C. Cao, W. Liu, J. Li, Y. Li, K. Sun, Y. Xu, Y. Cui, C. Yu, Q. Dong, Y. Tian, D. Yu, B. Shi, J. Zeng, R. Wang, W. Xie, Y. Li, Y. Patterson, Z. Tian, Y. Zhang, H. Zhou, S. Liu, Q. Zhao, C. Yue, X. Zhang, Z. Yang, 20202020We introduce CLUE, a Chinese Language Understanding Evaluation benchmark. It contains eight different tasks, including single-sentence classification, sentence pair classification, and machine reading comprehension. We evaluate CLUE on a number of existing full-network pre-trained models for… more
  • Belief Propagation Neural Networks

    J. Kuck, Shuvam Chakraborty, Hao Tang, R. Luo, Jiaming Song, A. Sabharwal, S. ErmonNeurIPS2020Learned neural solvers have successfully been used to solve combinatorial optimization and decision problems. More general counting variants of these problems, however, are still largely solved with hand-crafted solvers. To bridge this gap, we introduce belief propagation neural networks (BPNNs), a… more
  • Teaching Pre-Trained Models to Systematically Reason Over Implicit Knowledge

    Alon Talmor, Oyvind Tafjord, Peter Clark, Yoav Goldberg, Jonathan BerantNeurIPS • Spotlight Presentation2020To what extent can a neural network systematically reason over symbolic facts? Evidence suggests that large pre-trained language models (LMs) acquire some reasoning capacity, but this ability is difficult to control. Recently, it has been shown that Transformer-based models succeed in consistent… more
  • Neural Natural Language Inference Models Partially Embed Theories of Lexical Entailment and Negation

    Atticus Geiger, Kyle Richardson, Christopher PottsBlackbox NLP2020We address whether neural models for Natural Language Inference (NLI) can learn the compositional interactions between lexical entailment and negation, using four methods: the behavioral evaluation methods of (1) challenge test sets and (2) systematic generalization tasks, and the structural… more
  • A Dataset for Tracking Entities in Open Domain Procedural Text

    Niket Tandon, Keisuke Sakaguchi, Bhavana Dalvi Mishra, Dheeraj Rajagopal, Peter Clark, Michal Guerquin, Kyle Richardson, Eduard HovyEMNLP2020We present the first dataset for tracking state changes in procedural text from arbitrary domains by using an unrestricted (open) vocabulary. For example, in a text describing fog removal using potatoes, a car window may transition between being foggy, sticky, opaque, and clear. Previous… more
  • IIRC: A Dataset of Incomplete Information Reading Comprehension Questions

    James Ferguson, Matt Gardner. Hannaneh Hajishirzi, Tushar Khot, Pradeep DasigiEMNLP2020Humans often have to read multiple documents to address their information needs. However, most existing reading comprehension (RC) tasks only focus on questions for which the contexts provide all the information required to answer them, thus not evaluating a system’s performance at identifying a… more
  • OCNLI: Original Chinese Natural Language Inference

    H. Hu, Kyle Richardson, Liang Xu, L. Li, Sandra Kübler, L. MossFindings of EMNLP2020Despite the tremendous recent progress on natural language inference (NLI), driven largely by large-scale investment in new datasets (e.g., SNLI, MNLI) and advances in modeling, most progress has been limited to English due to a lack of reliable datasets for most of the world's languages. In this… more
  • A Simple Yet Strong Pipeline for HotpotQA

    Dirk Groeneveld, Tushar Khot, Mausam, Ashish SabharwalEMNLP2020State-of-the-art models for multi-hop question answering typically augment large-scale language models like BERT with additional, intuitively useful capabilities such as named entity recognition, graph-based reasoning, and question decomposition. However, does their strong performance on popular… more
  • Is Multihop QA in DiRe Condition? Measuring and Reducing Disconnected Reasoning

    H. Trivedi, N. Balasubramanian, Tushar Khot, A. SabharwalEMNLP2020Has there been real progress in multi-hop question-answering? Models often exploit dataset artifacts to produce correct answers, without connecting information across multiple supporting facts. This limits our ability to measure true progress and defeats the purpose of building multihop QA datasets… more
  • Learning to Explain: Datasets and Models for Identifying Valid Reasoning Chains in Multihop Question-Answering.

    Harsh Jhamtani, P. ClarkEMNLP2020Despite the rapid progress in multihop question-answering (QA), models still have trouble explaining why an answer is correct, with limited explanation training data available to learn from. To address this, we introduce three explanation datasets in which explanations formed from corpus facts are… more