Papers

Learn more about AI2's Lasting Impact Award
Viewing 11-20 of 282 papers
  • Demystifying Prompts in Language Models via Perplexity Estimation

    Hila Gonen, Srini Iyer, Terra Blevins, Noah A. Smith, Luke ZettlemoyerEMNLP Findings2023 Language models can be prompted to perform a wide variety of zero- and few-shot learning problems. However, performance varies significantly with the choice of prompt, and we do not yet understand why this happens or how to pick the best prompts. In this work…
  • Do All Languages Cost the Same? Tokenization in the Era of Commercial Language Models

    Orevaoghene Ahia, Sachin Kumar, Hila Gonen, Jungo Kasai, David R. Mortensen, Noah A. Smith, Yulia TsvetkovEMNLP2023 Language models have graduated from being research prototypes to commercialized products offered as web APIs, and recent works have highlighted the multilingual capabilities of these products. The API vendors charge their users based on usage, more…
  • Measuring and Narrowing the Compositionality Gap in Language Models

    Ofir Press, Muru Zhang, Sewon Min, Ludwig Schmidt, Noah A. Smith, Mike LewisEMNLP Findings2023 We investigate the ability of language models to perform compositional reasoning tasks where the overall solution depends on correctly composing the answers to sub-problems. We measure how often models can correctly answer all sub-problems but not generate…
  • Vera: A General-Purpose Plausibility Estimation Model for Commonsense Statements

    Jiacheng Liu, Wenya Wang, Dianzhuo Wang, Noah A. Smith, Yejin Choi, Hanna HajishirziEMNLP2023 Despite the much discussed capabilities of today's language models, they are still prone to silly and unexpected commonsense failures. We consider a retrospective verification approach that reflects on the correctness of LM outputs, and introduce Vera, a…
  • We're Afraid Language Models Aren't Modeling Ambiguity

    Alisa Liu, Zhaofeng Wu, Julian Michael, Alane Suhr, Peter West, Alexander Koller, Swabha Swayamdipta, Noah A. Smith, Yejin ChoiEMNLP2023 Ambiguity is an intrinsic feature of natural language. Managing ambiguity is a key part of human language understanding, allowing us to anticipate misunderstanding as communicators and revise our interpretations as listeners. As language models (LMs) are…
  • Evaluating In-Context Learning of Libraries for Code Generation

    Arkil Patel, Siva Reddy, Dzmitry Bahdanau, Pradeep DasigiarXiv2023 Contemporary Large Language Models (LLMs) exhibit a high degree of code generation and comprehension capability. A particularly promising area is their ability to interpret code modules from unfamiliar libraries for solving user-instructed tasks. Recent work…
  • Measuring and Improving Attentiveness to Partial Inputs with Counterfactuals

    Yanai Elazar, Bhargavi Paranjape, Hao Peng, Sarah Wiegreffe, Khyathi Raghavi, Vivek Srikumar, Sameer Singh, Noah A. SmitharXiv2023 The inevitable appearance of spurious correlations in training datasets hurts the generalization of NLP models on unseen data. Previous work has found that datasets with paired inputs are prone to correlations between a specific part of the input (e.g., the…
  • The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback

    Nathan Lambert, Roberto CalandraarXiv2023 Reinforcement learning from human feedback (RLHF) has emerged as a powerful technique to make large language models (LLMs) easier to prompt and more capable in complex settings. RLHF at its core is providing a new toolkit to optimize LLMs other than next…
  • Entangled Preferences: The History and Risks of Reinforcement Learning and Human Feedback

    Nathan Lambert, Thomas Krendl Gilbert, Tom ZickarXiv2023 Reinforcement learning from human feedback (RLHF) has emerged as a powerful technique to make large language models (LLMs) easier to use and more effective. A core piece of the RLHF process is the training and utilization of a model of human preferences that…
  • A taxonomy and review of generalization research in NLP

    D. Hupkes, Mario Giulianelli, Verna Dankers, Mikel Artetxe, Yanai Elazar, Tiago Pimentel, Christos Christodoulopoulos, Karim Lasri, Naomi Saphra, Arabella J. Sinclair, Dennis Ulmer, Florian Schottmann, Khuyagbaatar Batsuren, Kaiser Sun, Koustuv Sinha, Leila Khalatbari, Maria Ryskina, Rita Frieske, Ryan Cotterell, Zhijing JinNature Machine Intelligence2023 The ability to generalise well is one of the primary desiderata of natural language processing (NLP). Yet, what ‘good generalisation’ entails and how it should be evaluated is not well understood, nor are there any evaluation standards for generalisation. In…