Papers

Learn more about AI2's Lasting Impact Award
Viewing 1-10 of 184 papers
  • Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection

    Maarten Sap, Swabha Swayamdipta, Laura Vianna, Xuhui Zhou, Yejin Choi, Noah A. SmithNAACL2022 Warning : this paper discusses and contains content that is offensive or upsetting. The perceived toxicity of language can vary based on someone’s identity and beliefs, but this variation is often ignored when collecting toxic language datasets, resulting in…
  • Bidimensional Leaderboards: Generate and Evaluate Language Hand in Hand

    Jungo Kasai, Keisuke Sakaguchi, Ronan Le Bras, Lavinia Dunagan, Jacob Morrison, Alexander R. Fabbri, Yejin Choi, Noah A. SmithNAACL2022 Natural language processing researchers have identified limitations of evaluation methodology for generation tasks, with new questions raised about the validity of automatic metrics and of crowdworker judgments. Meanwhile, efforts to improve generation models…
  • DEMix Layers: Disentangling Domains for Modular Language Modeling

    Suchin Gururangan, Michael Lewis, Ari Holtzman, Noah A. Smith, Luke ZettlemoyerNAACL2022 We introduce a new domain expert mixture (DEMIX) layer that enables conditioning a language model (LM) on the domain of the input text. A DEMIX layer is a collection of expert feedforward networks, each specialized to a domain, that makes the LM modular…
  • Efficient Hierarchical Domain Adaptation for Pretrained Language Models

    Alexandra Chronopoulou, Matthew E. Peters, Jesse DodgeNAACL2022 The remarkable success of large language models has been driven by dense models trained on massive unlabeled, unstructured corpora. These corpora typically contain text from diverse, heterogeneous sources, but information about the source of the text is…
  • Few-Shot Self-Rationalization with Natural Language Prompts

    Ana Marasović, Iz Beltagy, Doug Downey, Matthew E. PetersFindings of NAACL2022 Self-rationalization models that predict task labels and generate free-text elaborations for their predictions could enable more intuitive interaction with NLP systems. These models are, however, currently trained with a large amount of human-written free…
  • MultiVerS: Improving scientific claim verification with weak supervision and full-document context

    David Wadden, Kyle Lo, Lucy Lu Wang, Arman Cohan, Iz Beltagy, Hannaneh HajishirziNAACL Findings2022 The scientific claim verification task requires an NLP system to label scientific documents which Support or Refute an input claim, and to select evidentiary sentences (or rationales) justifying each predicted label. In this work, we present MultiVerS, which…
  • NeuroLogic A*esque Decoding: Constrained Text Generation with Lookahead Heuristics

    Ximing Lu, S. Welleck, Peter West, Liwei Jiang, Jungo Kasai, Daniel Khashabi, Ronan Le Bras, Lianhui Qin, Youngjae Yu, Rowan Zellers, Noah A. Smith, Yejin ChoiNAACL2022
    Best Paper Award
    The dominant paradigm for neural text generation is left-to-right decoding from autoregressive language models. Constrained or controllable generation under complex lexical constraints, however, requires foresight to plan ahead feasible future paths. Drawing…
  • Time Waits for No One! Analysis and Challenges of Temporal Misalignment

    Kelvin Luu, Daniel Khashabi, Suchin Gururangan, Karishma Mandyam, Noah A. SmithNAACL2022 When an NLP model is trained on text data from one time period and tested or deployed on data from another, the resulting temporal misalignment can degrade end-task performance. In this work, we establish a suite of eight diverse tasks across different…
  • Transparent Human Evaluation for Image Captioning

    Jungo Kasai, Keisuke Sakaguchi, Lavinia Dunagan, Jacob Morrison, Ronan Le Bras, Yejin Choi, Noah A. SmithNAACL2022 We establish a rubric-based human evaluation protocol for image captioning models. Our scoring rubrics and their definitions are carefully developed based on machineand humangenerated captions on the MSCOCO dataset. Each caption is evaluated along two main…
  • Data Governance in the Age of Large-Scale Data-Driven Language Technology

    Yacine Jernite, Huu Nguyen, Stella Rose Biderman, A. Rogers, Maraim Masoud, V. Danchev, Samson Tan, A. Luccioni, Nishant Subramani, Gérard Dupont, Jesse Dodge, Kyle Lo, Zeerak Talat, Isaac Johnson, Dragomir R. Radev, Somaieh Nikpoor, Jorg Frohberg, Aaron Gokaslan, Peter Henderson, Rishi Bommasani, Margaret MitchellFAccT2022 The recent emergence and adoption of Machine Learning technology, and specifically of Large Language Models, has drawn attention to the need for systematic and transparent management of language data. This work proposes an approach to global language data…