Papers

Learn more about AI2's Lasting Impact Award
Viewing 101-110 of 214 papers
  • OCNLI: Original Chinese Natural Language Inference

    H. Hu, Kyle Richardson, Liang Xu, L. Li, Sandra Kübler, L. MossFindings of EMNLP2020 Despite the tremendous recent progress on natural language inference (NLI), driven largely by large-scale investment in new datasets (e.g., SNLI, MNLI) and advances in modeling, most progress has been limited to English due to a lack of reliable datasets for…
  • UnifiedQA: Crossing Format Boundaries With a Single QA System

    Daniel Khashabi, Sewon Min, Tushar Khot, Ashish Sabharwal, Oyvind Tafjord, Peter Clark, Hannaneh HajishirziFindings of EMNLP2020 Question answering (QA) tasks have been posed using a variety of formats, such as extractive span selection, multiple choice, etc. This has led to format-specialized models, and even to an implicit division in the QA community. We argue that such boundaries…
  • UnQovering Stereotyping Biases via Underspecified Questions

    Tao Li, Tushar Khot, Daniel Khashabi, Ashish Sabharwal, Vivek SrikumarFindings of EMNLP2020 While language embeddings have been shown to have stereotyping biases, how these biases affect downstream question answering (QA) models remains unexplored. We present UNQOVER, a general framework to probe and quantify biases through underspecified questions…
  • What-if I ask you to explain: Explaining the effects of perturbations in procedural text

    Dheeraj Rajagopal, Niket Tandon, Peter Clark, Bhavana Dalvi, Eduard H. HovyFindings of EMNLP2020 We address the task of explaining the effects of perturbations in procedural text, an important test of process comprehension. Consider a passage describing a rabbit's life-cycle: humans can easily explain the effect on the rabbit population if a female…
  • "You are grounded!": Latent Name Artifacts in Pre-trained Language Models

    Vered Shwartz, Rachel Rudinger, Oyvind TafjordEMNLP2020 Pre-trained language models (LMs) may perpetuate biases originating in their training corpus to downstream models. We focus on artifacts associated with the representation of given names (e.g., Donald), which, depending on the corpus, may be associated with…
  • Evaluating Models' Local Decision Boundaries via Contrast Sets

    M. Gardner, Y. Artzi, V. Basmova, J. Berant, B. Bogin, S. Chen, P. Dasigi, D. Dua, Y. Elazar, A. Gottumukkala, N. Gupta, H. Hajishirzi, G. Ilharco, D.Khashabi, K. Lin, J. Liu, N. F. Liu, P. Mulcaire, Q. Ning, S.Singh, N.A. Smith, S. Subramanian, et alFindings of EMNLP2020 Standard test sets for supervised learning evaluate in-distribution generalization. Unfortunately, when a dataset has systematic gaps (e.g., annotation artifacts), these evaluations are misleading: a model can learn simple decision rules that perform well on…
  • What Does My QA Model Know? Devising Controlled Probes using Expert Knowledge

    Kyle Richardson, Ashish SabharwalTACL2020 Open-domain question answering (QA) is known to involve several underlying knowledge and reasoning challenges, but are models actually learning such knowledge when trained on benchmark tasks? To investigate this, we introduce several new challenge tasks that…
  • AdaWISH: Faster Discrete Integration via Adaptive Quantiles

    Fan Ding, Hanjing Wang, Ashish Sabharwal, Yexiang XueECAI2020 Discrete integration in a high dimensional space of $n$ variables poses fundamental challenges. The WISH algorithm reduces the intractable discrete integration problem into $n$ optimization queries subject to randomized constraints, obtaining a constant…
  • Approximating the Permanent by Sampling from Adaptive Partitions

    Jonathan Kuck, Tri Dao, Hamid Rezatofighi, Ashish Sabharwal, Stefano ErmonUAI2020 Computing the permanent of a non-negative matrix is a core problem with practical applications ranging from target tracking to statistical thermodynamics. However, this problem is also #P-complete, which leaves little hope for finding an exact solution that…
  • Multi-class Hierarchical Question Classification for Multiple Choice Science Exams

    Dongfang Xu, Peter Jansen, Jaycie Martin, Zhengnan Xie, Vikas Yadav, Harish Tayyar Madabushi, Oyvind Tafjord, Peter ClarkIJCAI2020 Prior work has demonstrated that question classification (QC), recognizing the problem domain of a question, can help answer it more accurately. However, developing strong QC algorithms has been hindered by the limited size and complexity of annotated data…