Award Winning Papers

Learn more about AI2's Lasting Impact Award
Viewing 1-10 of 22 papers
  • Mauve: An Information Divergence Measure Between Neural Text and Human Text

    Krishna Pillutla, Swabha Swayamdipta, Rowan Zellers, John Thickstun, S. Welleck, Yejin Choi, Z. HarchaouiNeurIPS2021 As major progress is made in open-ended text generation, measuring how close machine-generated text is to human language remains a critical open problem. We propose Mauve, a comparison measure for open-ended text generation, which directly compares a…
  • Specializing Multilingual Language Models: An Empirical Study

    Ethan C. Chau, Noah A. SmithEMNLP • Workshop on Multilingual Representation Learning2021
    Best Paper Honorable Mention
    Pretrained multilingual language models have become a common tool in transferring NLP capabilities to low-resource languages, often with adaptations. In this work, we study the performance, extensibility, and interaction of two such adaptations: vocabulary…
  • SciCo: Hierarchical Cross-Document Coreference for Scientific Concepts

    Arie Cattan, Sophie Johnson, Daniel S. Weld, Ido Dagan, Iz Beltagy, Doug Downey, Tom HopeAKBC2021 Determining coreference of concept mentions across multiple documents is fundamental for natural language understanding. Work on cross-document coreference resolution (CDCR) typically considers mentions of events in the news, which do not often involve…
  • All That’s ‘Human’ Is Not Gold: Evaluating Human Evaluation of Generated Text

    Elizabeth Clark, Tal August, Sofia Serrano, Nikita Haduong, Suchin Gururangan, Noah A. SmithACL2021 Human evaluations are typically considered the gold standard in natural language generation, but as models' fluency improves, how well can evaluators detect and judge machine-generated text? We run a study assessing non-experts' ability to distinguish between…
  • Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

    Suchin Gururangan, Ana Marasović, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, Noah A. SmithACL2020 Language models pretrained on text from a wide variety of sources form the foundation of today's NLP. In light of the success of these broad-coverage models, we investigate whether it is still helpful to tailor a pretrained model to the domain of a target…
  • Social Bias Frames: Reasoning about Social and Power Implications of Language

    Maarten Sap, Saadia Gabriel, Lianhui Qin, Dan Jurafsky, Noah A. Smith, Yejin ChoiACL2020
    WeCNLP Best Paper
    Language has the power to reinforce stereotypes and project social biases onto others. At the core of the challenge is that it is rarely what is stated explicitly, but all the implied meanings that frame people's judgements about others. For example, given a…
  • Procedural Reading Comprehension with Attribute-Aware Context Flow

    Aida Amini, Antoine Bosselut, Bhavana Dalvi Mishra, Yejin Choi, Hannaneh HajishirziAKBC2020 Procedural texts often describe processes (e.g., photosynthesis and cooking) that happen over entities (e.g., light, food). In this paper, we introduce an algorithm for procedural reading comprehension by translating the text into a general formalism that…
  • WinoGrande: An Adversarial Winograd Schema Challenge at Scale

    Keisuke Sakaguchi, Ronan Le Bras, Chandra Bhagavatula, Yejin ChoiAAAI2020 The Winograd Schema Challenge (WSC), proposed by Levesque et al. (2011) as an alternative to the Turing Test, was originally designed as a pronoun resolution problem that cannot be solved based on statistical patterns in large text corpora. However, recent…
  • Evaluating Question Answering Evaluation

    Anthony Chen, Gabriel Stanovsky, Sameer Singh, Matt GardnerEMNLP • MRQA Workshop2019 As the complexity of question answering (QA) datasets evolve, moving away from restricted formats like span extraction and multiple-choice (MC) to free-form answer generation, it is imperative to understand how well current metrics perform in evaluating QA…
  • AllenNLP Interpret: A Framework for Explaining Predictions of NLP Models

    Eric Wallace, Jens Tuyls, Junlin Wang, Sanjay Subramanian, Matthew Gardner, Sameer SinghEMNLP2019 Neural NLP models are increasingly accurate but are imperfect and opaque---they break in counterintuitive ways and leave end users puzzled at their behavior. Model interpretation methods ameliorate this opacity by providing explanations for specific model…