Papers

Learn more about AI2's Lasting Impact Award
All Projects
All Years
Viewing 21-30 of 553 papers
  • proScript: Partially Ordered Scripts Generation

    Keisuke Sakaguchi, Chandra Bhagavatula, R. L. Bras, Niket Tandon, P. Clark, Yejin ChoiFindings of EMNLP2021 Scripts standardized event sequences describing typical everyday activities have been shown to help understand narratives by providing expectations, resolving ambiguity, and filling in unstated information. However, to date they have proved hard to author or… more
  • Transformer Feed-Forward Layers Are Key-Value Memories

    Mor Geva, R. Schuster, Jonathan Berant, Omer LevyEMNLP2021 Feed-forward layers constitute two-thirds of a transformer model’s parameters, yet their role in the network remains underexplored. We show that feed-forward layers in transformer-based language models operate as key-value memories, where each key correlates… more
  • What's in your Head? Emergent Behaviour in Multi-Task Transformer Models

    Mor Geva, Uri Katz, Aviv Ben-Arie, Jonathan BerantEMNLP2021 The primary paradigm for multi-task training in natural language processing is to represent the input with a shared pre-trained language model, and add a small, thin network (head) per task. Given an input, a target head is the head that is selected for… more
  • Analyzing Commonsense Emergence in Few-shot Knowledge Models

    Peter West, Ximing Lu, Ari Holtzman, Chandra Bhagavatula, Jena D. Hwang, Yejin ChoiAKBC2021 Publicly available, large pretrained Language Models (LMs) generate text with remarkable quality, but only sequentially from left to right. As a result, they are not immediately applicable to generation tasks that break the unidirectional assumption, such as… more
  • SciCo: Hierarchical Cross-Document Coreference for Scientific Concepts

    Arie Cattan, Sophie Johnson, Daniel S. Weld, Ido Dagan, Iz Beltagy, Doug Downey, Tom HopeAKBC2021 Determining coreference of concept mentions across multiple documents is fundamental for natural language understanding. Work on cross-document coreference resolution (CDCR) typically considers mentions of events in the news, which do not often involve… more
  • Scientific Language Models for Biomedical Knowledge Base Completion: An Empirical Study

    Rahul Nadkarni, David Wadden, Iz Beltagy, Noah A. Smith, Hannaneh Hajishirzi, Tom HopeAKBC2021 Biomedical knowledge graphs (KGs) hold rich information on entities such as diseases, drugs, and genes. Predicting missing links in these graphs can boost many important applications, such as drug design and repurposing. Recent work has shown that general… more
  • Ethical-Advice Taker: Do Language Models Understand Natural Language Interventions?

    Jieyu Zhao, Daniel Khashabi, Tushar Khot, Ashish Sabharwal and Kai-Wei Chang ACL-IJCNLP2021 Is it possible to use natural language to intervene in a model’s behavior and alter its prediction in a desired way? We investigate the effectiveness of natural language interventions for reading-comprehension systems, studying this in the context of social… more
  • Investigating Transfer Learning in Multilingual Pre-trained Language Models through Chinese Natural Language Inference

    Hai Hu, He Zhou, Zuoyu Tian, Yiwen Zhang, Yina Ma, Yanting Li, Yixin Nie, Kyle RichardsonFindings of ACL2021 Multilingual transformers (XLM, mT5) have been shown to have remarkable transfer skills in zero-shot settings. Most transfer studies, however, rely on automatically translated resources (XNLI, XQuAD), making it hard to discern the particular linguistic… more
  • ReadOnce Transformers: Reusable Representations of Text for Transformers

    Shih-Ting Lin, Ashish Sabharwal, Tushar KhotACL2021 While large-scale language models are extremely effective when directly fine-tuned on many end-tasks, such models learn to extract information and solve the task simultaneously from end-task supervision. This is wasteful, as the general problem of gathering… more
  • Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine Translation

    Jungo Kasai, Nikolaos Pappas, Hao Peng, James Cross, Noah A. SmithICLR2021 State-of-the-art neural machine translation models generate outputs autoregressively, where every step conditions on the previously generated tokens. This sequential nature causes inherent decoding latency. Non-autoregressive translation techniques, on the… more
All Projects
All Years