Papers

Learn more about AI2's Lasting Impact Award
Viewing 131-140 of 292 papers
  • Saturated Transformers are Constant-Depth Threshold Circuits

    William Merrill, Ashish Sabharwal, Noah A. SmithTACL2022 Transformers have become a standard neural network architecture for many NLP problems, motivating theoretical analysis of their power in terms of formal languages. Recent work has shown that transformers with *hard* attention are quite limited in power, as…
  • Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation

    Ofir Press, Noah A. Smith, M. LewisICLR2022 Since the introduction of the transformer model by Vaswani et al. (2017), a fundamental question has yet to be answered: how does a model achieve extrapolation at inference time for sequences that are longer than it saw during training? We first show that…
  • Beam Decoding with Controlled Patience

    Jungo Kasai, Keisuke Sakaguchi, Ronan Le Bras, Dragomir Radev, Yejin Choi, Noah A. SmitharXiv2022 Text generation with beam search has proven successful in a wide range of applications. The commonly-used implementation of beam decoding follows a first come, first served heuris-tic: it keeps a set of already completed sequences over time steps and stops when…
  • Benchmarking Generalization via In-Context Instructions on 1, 600+ Language Tasks

    Yizhong Wang, Swaroop Mishra, Pegah Alipoormolabashi, Yeganeh Kordi, Amirreza Mirzaei, A. Arunkumar, Arjun Ashok, Arut Selvan Dhanasekaran, Atharva Naik, David Stap, Eshaan Pathak, Giannis Karamanolakis, Haizhi Gary Lai, I. Purohit, Ishani Mondal, Jacob Anderson, Kirby Kuznia, Krima Doshi, Maitreya Patel, Kuntal Kumar Pal, M. Moradshahi, Mihir Parmar, Mirali Purohit, Neeraj Varshney, Phani Rohitha Kaza, Pulkit Verma, Ravsehaj Singh Puri, Rushang Karia, Shailaja Keyur Sampat, Savan Doshi, S. Mishra, Sujan C. Reddy, Sumanta Patro, Tanay Dixit, Xu-dong Shen, Chitta Baral, Yejin Choi, Noah A. Smith, Hannaneh Hajishirzi, Daniel KhashabiarXiv2022 How can we measure the generalization of models to a variety of unseen tasks when provided with their language instructions? To facilitate progress in this goal, we introduce N ATURAL -I NSTRUCTIONS v 2 , a collection of 1,600+ diverse language tasks and…
  • Don't Say What You Don't Know: Improving the Consistency of Abstractive Summarization by Constraining Beam Search

    Daniel King, Zejiang Shen, Nishant Subramani, Daniel S. Weld, Iz Beltagy, Doug DowneyGEM Workshop 20222022 Abstractive summarization systems today produce fluent and relevant output, but often “hallucinate” statements not supported by the source text. We analyze the connection between hallucinations and training data, and find evidence that models hallucinate…
  • Staged Training for Transformer Language Models

    Sheng Shen, Pete Walsh, K. Keutzer, Jesse Dodge, Matthew E. Peters, Iz BeltagyICML 20222022 The current standard approach to scaling transformer language models trains each model size from a different random initialization. As an alternative, we consider a staged training setup that begins with a small model and incremen-tally increases the amount…
  • A Controllable Model of Grounded Response Generation

    Zeqiu Wu, Michel Galley, Chris Brockett, Yizhe Zhang, Xiang Gao, Chris Quirk, Rik Koncel-Kedziorski, Jianfeng Gao, Hannaneh Hajishirzi, Mari Ostendorf, Bill DolanAAAI 2022 Current end-to-end neural conversation models inherently lack the flexibility to impose semantic control in the response generation process. This control is essential to ensure that users' semantic intents are satisfied and to impose a degree of specificity…
  • Computational Lens on Cognition: Study Of Autobiographical Versus Imagined Stories With Large-Scale Language Models

    Maarten Sap, A. Jafarpour, Yejin Choi, Noah A. Smith, J. Pennebaker, E. HorvitzarXiv2022 Lifelong experiences and learned knowledge lead to shared expectations about how common situations tend to unfold. Such knowledge enables people to interpret story narratives and identify salient events effortlessly. We study differences in the narrative flow…
  • Imagined versus Remembered Stories: Quantifying Differences in Narrative Flow

    Maarten Sap, A. Jafarpour, Yejin Choi, Noah A. Smith, J. Pennebaker, E. HorvitzSociology2022 Lifelong experiences and learned knowledge lead to shared expectations about how common situations tend to unfold. Such knowledge of narrative event flow enables people to weave together a story. However, comparable computational tools to evaluate the flow of…
  • PROMPT WAYWARDNESS: The Curious Case of Discretized Interpretation of Continuous Prompts

    Daniel Khashabi, Shan Lyu, Sewon Min, Lianhui Qin, Kyle Richardson, Sean Welleck, Hannaneh Hajishirzi, Tushar Khot, Ashish Sabharwal, Sameer Singh, Yejin ChoiNAACL2022 Fine-tuning continuous prompts for target tasks has recently emerged as a compact alternative to full model fine-tuning. Motivated by these promising results, we investigate the feasibility of extracting a discrete (textual) interpretation of continuous…