Papers

Learn more about AI2's Lasting Impact Award
Viewing 1-10 of 683 papers
  • A Dataset for N-ary Relation Extraction of Drug Combinations

    Aryeh Tiktinsky, Vijay Viswanathan, Danna Niezni, Dana Azagury, Yosi Shamay, Hillel Taub-Tabib, Tom Hope, Yoav GoldbergNAACL2022 Combination therapies have become the standard of care for diseases such as cancer, tuberculosis, malaria and HIV. However, the combinatorial set of available multi-drug treatments creates a challenge in identifying effective combination therapies available…
  • Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection

    Maarten Sap, Swabha Swayamdipta, Laura Vianna, Xuhui Zhou, Yejin Choi, Noah A. SmithNAACL2022 Warning : this paper discusses and contains content that is offensive or upsetting. The perceived toxicity of language can vary based on someone’s identity and beliefs, but this variation is often ignored when collecting toxic language datasets, resulting in…
  • Bidimensional Leaderboards: Generate and Evaluate Language Hand in Hand

    Jungo Kasai, Keisuke Sakaguchi, Ronan Le Bras, Lavinia Dunagan, Jacob Morrison, Alexander R. Fabbri, Yejin Choi, Noah A. SmithNAACL2022 Natural language processing researchers have identified limitations of evaluation methodology for generation tasks, with new questions raised about the validity of automatic metrics and of crowdworker judgments. Meanwhile, efforts to improve generation models…
  • DEMix Layers: Disentangling Domains for Modular Language Modeling

    Suchin Gururangan, Michael Lewis, Ari Holtzman, Noah A. Smith, Luke ZettlemoyerNAACL2022 We introduce a new domain expert mixture (DEMIX) layer that enables conditioning a language model (LM) on the domain of the input text. A DEMIX layer is a collection of expert feedforward networks, each specialized to a domain, that makes the LM modular…
  • DREAM: Improving Situational QA by First Elaborating the Situation

    Yuling Gu, Bhavana Dalvi Mishra, Peter ClarkNAACL 20212022 When people answer questions about a specific situation, e.g., "I cheated on my mid-term exam last week. Was that wrong?", cognitive science suggests that they form a mental picture of that situation before answering. While we do not know how language models…
  • Few-Shot Self-Rationalization with Natural Language Prompts

    Ana Marasović, Iz Beltagy, Doug Downey, Matthew E. PetersFindings of NAACL2022 Self-rationalization models that predict task labels and generate free-text elaborations for their predictions could enable more intuitive interaction with NLP systems. These models are, however, currently trained with a large amount of human-written free…
  • NeuroLogic A*esque Decoding: Constrained Text Generation with Lookahead Heuristics

    Ximing Lu, S. Welleck, Peter West, Liwei Jiang, Jungo Kasai, Daniel Khashabi, Ronan Le Bras, Lianhui Qin, Youngjae Yu, Rowan Zellers, Noah A. Smith, Yejin ChoiNAACL2022 The dominant paradigm for neural text generation is left-to-right decoding from autoregressive language models. Constrained or controllable generation under complex lexical constraints, however, requires foresight to plan ahead feasible future paths. Drawing…
  • Symbolic Knowledge Distillation: from General Language Models to Commonsense Models

    Peter West, Chandrasekhar Bhagavatula, Jack Hessel, Jena D. Hwang, Liwei Jiang, Ronan Le Bras, Ximing Lu, S. Welleck, Yejin ChoiNAACL2022 The common practice for training commonsense models has gone from–human–to– corpus–to–machine: humans author commonsense knowledge graphs in order to train commonsense models. In this work, we investigate an alternative, from–machine–to–corpus– to–machine…
  • Time Waits for No One! Analysis and Challenges of Temporal Misalignment

    Kelvin Luu, Daniel Khashabi, Suchin Gururangan, Karishma Mandyam, Noah A. SmithNAACL2022 When an NLP model is trained on text data from one time period and tested or deployed on data from another, the resulting temporal misalignment can degrade end-task performance. In this work, we establish a suite of eight diverse tasks across different…
  • Transparent Human Evaluation for Image Captioning

    Jungo Kasai, Keisuke Sakaguchi, Lavinia Dunagan, Jacob Morrison, Ronan Le Bras, Yejin Choi, Noah A. SmithNAACL2022 We establish a rubric-based human evaluation protocol for image captioning models. Our scoring rubrics and their definitions are carefully developed based on machineand humangenerated captions on the MSCOCO dataset. Each caption is evaluated along two main…