Papers

Learn more about AI2's Lasting Impact Award
Viewing 11-20 of 173 papers
  • Extracting Latent Steering Vectors from Pretrained Language Models

    Nishant Subramani, Nivedita Suresh, Matthew E. PetersACL FINDINGS2022 Prior work on controllable text generation has focused on learning how to control language models through trainable decoding, smart-prompt design, or fine-tuning based on a desired objective. We hypothesize that the information needed to steer the model to…
  • Generated Knowledge Prompting for Commonsense Reasoning

    Jiachen Liu, Alisa Liu, Ximing Lu, S. Welleck, Peter West, Ronan Le Bras, Yejin Choi, Hannaneh HajishirziACL2022 Despite their ability to capture large amount of knowledge during pretraining, large-scale language models often benefit from incorporating external knowledge bases, especially on commonsense reasoning tasks. This motivates us to explore how we can best…
  • Generating Data to Mitigate Spurious Correlations in Natural Language Inference Datasets

    Yuxiang Wu, Matt Gardner, Pontus Stenetorp, Pradeep DasigiACL2022 Natural language processing models often exploit spurious correlations between task-independent features and labels in datasets to perform well only within the distributions they are trained on, while not generalising to different task distributions. We…
  • Generating Scientific Definitions with Controllable Complexity

    Tal August, Katharina Reinecke, Noah A. SmithACL2022 Unfamiliar terminology and complex language can present barriers to understanding science. Natural language processing stands to help address these issues by automatically defining unfamiliar terms. We introduce a new task and dataset for defining scientific…
  • Saturated Transformers are Constant-Depth Threshold Circuits

    William Cooper Merrill, Ashish Sabharwal, Noah A. SmithTACL2022 Transformers have become a standard neural network architecture for many NLP problems, motivating theoretical analysis of their power in terms of formal languages. Recent work has shown that transformers with *hard* attention are quite limited in power, as…
  • Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation

    Ofir Press, Noah A. Smith, M. LewisICLR2022 Since the introduction of the transformer model by Vaswani et al. (2017), a fundamental question has yet to be answered: how does a model achieve extrapolation at inference time for sequences that are longer than it saw during training? We first show that…
  • Benchmarking Generalization via In-Context Instructions on 1, 600+ Language Tasks

    Yizhong Wang, Swaroop Mishra, Pegah Alipoormolabashi, Yeganeh Kordi, Amirreza Mirzaei, A. Arunkumar, Arjun Ashok, Arut Selvan Dhanasekaran, Atharva Naik, David Stap, Eshaan Pathak, Giannis Karamanolakis, Haizhi Gary Lai, I. Purohit, Ishani Mondal, Jacob Anderson, Kirby Kuznia, Krima Doshi, Maitreya Patel, Kuntal Kumar Pal, M. Moradshahi, Mihir Parmar, Mirali Purohit, Neeraj Varshney, Phani Rohitha Kaza, Pulkit Verma, Ravsehaj Singh Puri, Rushang Karia, Shailaja Keyur Sampat, Savan Doshi, S. Mishra, Sujan C. Reddy, Sumanta Patro, Tanay Dixit, Xu-dong Shen, Chitta Baral, Yejin Choi, Noah A. Smith, Hannaneh Hajishirzi, Daniel KhashabiarXiv2022 How can we measure the generalization of models to a variety of unseen tasks when provided with their language instructions? To facilitate progress in this goal, we introduce N ATURAL -I NSTRUCTIONS v 2 , a collection of 1,600+ diverse language tasks and…
  • Don't Say What You Don't Know: Improving the Consistency of Abstractive Summarization by Constraining Beam Search

    Daniel King, Zejiang Shen, Nishant Subramani, Daniel S. Weld, Iz Beltagy, Doug DowneyarXiv2022 Abstractive summarization systems today produce fluent and relevant output, but often “hallucinate” statements not supported by the source text. We analyze the connection between hallucinations and training data, and find evidence that models hallucinate…
  • A Controllable Model of Grounded Response Generation

    Zeqiu Wu, Michel Galley, Chris Brockett, Yizhe Zhang, Xiang Gao, Chris Quirk, Rik Koncel-Kedziorski, Jianfeng Gao, Hannaneh Hajishirzi, Mari Ostendorf, Bill DolanAAAI 2022 Current end-to-end neural conversation models inherently lack the flexibility to impose semantic control in the response generation process. This control is essential to ensure that users' semantic intents are satisfied and to impose a degree of specificity…
  • Computational Lens on Cognition: Study Of Autobiographical Versus Imagined Stories With Large-Scale Language Models

    Maarten Sap, A. Jafarpour, Yejin Choi, Noah A. Smith, J. Pennebaker, E. HorvitzarXiv2022 Lifelong experiences and learned knowledge lead to shared expectations about how common situations tend to unfold. Such knowledge enables people to interpret story narratives and identify salient events effortlessly. We study differences in the narrative flow…