Papers
See AI2's Award Winning Papers
Learn more about AI2's Lasting Impact Award
Viewing 11-20 of 173 papers
Extracting Latent Steering Vectors from Pretrained Language Models
Nishant Subramani, Nivedita Suresh, Matthew E. PetersACL FINDINGS • 2022 Prior work on controllable text generation has focused on learning how to control language models through trainable decoding, smart-prompt design, or fine-tuning based on a desired objective. We hypothesize that the information needed to steer the model to…Generated Knowledge Prompting for Commonsense Reasoning
Jiachen Liu, Alisa Liu, Ximing Lu, S. Welleck, Peter West, Ronan Le Bras, Yejin Choi, Hannaneh HajishirziACL • 2022 Despite their ability to capture large amount of knowledge during pretraining, large-scale language models often benefit from incorporating external knowledge bases, especially on commonsense reasoning tasks. This motivates us to explore how we can best…Generating Data to Mitigate Spurious Correlations in Natural Language Inference Datasets
Yuxiang Wu, Matt Gardner, Pontus Stenetorp, Pradeep DasigiACL • 2022 Natural language processing models often exploit spurious correlations between task-independent features and labels in datasets to perform well only within the distributions they are trained on, while not generalising to different task distributions. We…Generating Scientific Definitions with Controllable Complexity
Tal August, Katharina Reinecke, Noah A. SmithACL • 2022 Unfamiliar terminology and complex language can present barriers to understanding science. Natural language processing stands to help address these issues by automatically defining unfamiliar terms. We introduce a new task and dataset for defining scientific…Saturated Transformers are Constant-Depth Threshold Circuits
William Cooper Merrill, Ashish Sabharwal, Noah A. SmithTACL • 2022 Transformers have become a standard neural network architecture for many NLP problems, motivating theoretical analysis of their power in terms of formal languages. Recent work has shown that transformers with *hard* attention are quite limited in power, as…Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
Ofir Press, Noah A. Smith, M. LewisICLR • 2022 Since the introduction of the transformer model by Vaswani et al. (2017), a fundamental question has yet to be answered: how does a model achieve extrapolation at inference time for sequences that are longer than it saw during training? We first show that…Benchmarking Generalization via In-Context Instructions on 1, 600+ Language Tasks
Yizhong Wang, Swaroop Mishra, Pegah Alipoormolabashi, Yeganeh Kordi, Amirreza Mirzaei, A. Arunkumar, Arjun Ashok, Arut Selvan Dhanasekaran, Atharva Naik, David Stap, Eshaan Pathak, Giannis Karamanolakis, Haizhi Gary Lai, I. Purohit, Ishani Mondal, Jacob Anderson, Kirby Kuznia, Krima Doshi, Maitreya Patel, Kuntal Kumar Pal, M. Moradshahi, Mihir Parmar, Mirali Purohit, Neeraj Varshney, Phani Rohitha Kaza, Pulkit Verma, Ravsehaj Singh Puri, Rushang Karia, Shailaja Keyur Sampat, Savan Doshi, S. Mishra, Sujan C. Reddy, Sumanta Patro, Tanay Dixit, Xu-dong Shen, Chitta Baral, Yejin Choi, Noah A. Smith, Hannaneh Hajishirzi, Daniel KhashabiarXiv • 2022 How can we measure the generalization of models to a variety of unseen tasks when provided with their language instructions? To facilitate progress in this goal, we introduce N ATURAL -I NSTRUCTIONS v 2 , a collection of 1,600+ diverse language tasks and…Don't Say What You Don't Know: Improving the Consistency of Abstractive Summarization by Constraining Beam Search
Daniel King, Zejiang Shen, Nishant Subramani, Daniel S. Weld, Iz Beltagy, Doug DowneyarXiv • 2022 Abstractive summarization systems today produce fluent and relevant output, but often “hallucinate” statements not supported by the source text. We analyze the connection between hallucinations and training data, and find evidence that models hallucinate…A Controllable Model of Grounded Response Generation
Zeqiu Wu, Michel Galley, Chris Brockett, Yizhe Zhang, Xiang Gao, Chris Quirk, Rik Koncel-Kedziorski, Jianfeng Gao, Hannaneh Hajishirzi, Mari Ostendorf, Bill DolanAAAI • 2022 Current end-to-end neural conversation models inherently lack the flexibility to impose semantic control in the response generation process. This control is essential to ensure that users' semantic intents are satisfied and to impose a degree of specificity…Computational Lens on Cognition: Study Of Autobiographical Versus Imagined Stories With Large-Scale Language Models
Maarten Sap, A. Jafarpour, Yejin Choi, Noah A. Smith, J. Pennebaker, E. HorvitzarXiv • 2022 Lifelong experiences and learned knowledge lead to shared expectations about how common situations tend to unfold. Such knowledge enables people to interpret story narratives and identify salient events effortlessly. We study differences in the narrative flow…