Learn more about AI2's Lasting Impact Award
All Projects
All Years
Viewing 1-10 of 553 papers
  • CommonsenseQA 2.0: Exposing the Limits of AI through Gamification

    Alon Talmor, Ori Yoran, Ronan Le Bras, Chandrasekhar Bhagavatula, Yoav Goldberg, Yejin Choi, Jonathan Berant NeurIPS2021 Constructing benchmarks that test the abilities of modern natural language un1 derstanding models is difficult – pre-trained language models exploit artifacts in 2 benchmarks to achieve human parity, but still fail on adversarial examples and make 3 errors… more
  • NaturalProofs: Mathematical Theorem Proving in Natural Language

    S. Welleck, Jiachen Liu, Ronan Le Bras, Hannaneh Hajishirzi, Yejin Choi, Kyunghyun ChoNeurIPS2021 Understanding and creating mathematics using natural mathematical language – the mixture of symbolic and natural language used by humans – is a challenging and important problem for driving progress in machine learning. As a step in this direction, we develop… more
  • Teach Me to Explain: A Review of Datasets for Explainable NLP

    Sarah Wiegreffe and Ana Marasović NeurIPS2021 Explainable NLP (ExNLP) has increasingly focused on collecting human-annotated explanations. These explanations are used downstream in three ways: as data augmentation to improve performance on a predictive task, as a loss signal to train models to produce… more
  • Back to Square One: Bias Detection, Training and Commonsense Disentanglement in the Winograd Schema

    Yanai Elazar, Hongming Zhang, Yoav Goldberg, Dan RothEMNLP2021 The Winograd Schema (WS) has been proposed as a test for measuring commonsense capabilities of models. Recently, pre-trained language model-based approaches have boosted performance on some WS benchmarks but the source of improvement is still not clear. We… more
  • CLIPScore: A Reference-free Evaluation Metric for Image Captioning

    Jack Hessel, Ariel Holtzman, Maxwell Forbes, R. L. Bras, Yejin ChoiEMNLP2021 Image captioning has conventionally relied on reference-based automatic evaluations, where machine captions are compared against captions written by humans. This is in stark contrast to the reference-free manner in which humans assess caption quality. In this… more
  • Competency Problems: On Finding and Removing Artifacts in Language Data

    Matt Gardner, William Merrill, Jesse Dodge, Matthew Peters, Alexis Ross, Sameer Singh and Noah A. SmithEMNLP2021 Much recent work in NLP has documented dataset artifacts, bias, and spurious correlations between input features and output labels. However, how to tell which features have “spurious” instead of legitimate correlations is typically left unspecified. In this… more
  • Contrastive Explanations for Model Interpretability

    Alon Jacovi, Swabha Swayamdipta, Shauli Ravfogel, Yanai Elazar, Yejin Choi, Yoav GoldbergEMNLP2021 Contrastive explanations clarify why an event occurred in contrast to another. They are more inherently intuitive to humans to both produce and comprehend. We propose a methodology to produce contrastive explanations for classification models by modifying the… more
  • Cross-Document Language Modeling

    Avi Caciularu, Arman Cohan, Iz Beltagy, Matthew E. Peters, Arie Cattan, Ido DaganFindings of EMNLP2021 We introduce a new pretraining approach for language models that are geared to support multi-document NLP tasks. Our crossdocument language model (CD-LM) improves masked language modeling for these tasks with two key ideas. First, we pretrain with multiple… more
  • Documenting the English Colossal Clean Crawled Corpus

    Jesse Dodge, Maarten Sap, Ana Marasović, William Agnew, Gabriel Ilharco, Dirk Groeneveld, Matt GardnerEMNLP2021 As language models are trained on ever more text, researchers are turning to some of the largest corpora available. Unlike most other types of datasets in NLP, large unlabeled text corpora are often presented with minimal documentation, and best practices for… more
  • Explaining Answers with Entailment Trees

    Bhavana Dalvi, Peter A. Jansen, Oyvind Tafjord, Zhengnan Xie, Hannah Smith, Leighanna Pipatanangkura, Peter ClarkEMNLP2021 Our goal, in the context of open-domain textual question-answering (QA), is to explain answers by not just listing supporting textual evidence (“rationales”), but also showing how such evidence leads to the answer in a systematic way. If this could be done… more
All Projects
All Years