Papers

Learn more about AI2's Lasting Impact Award
Viewing 21-30 of 292 papers
  • Machine Reading Comprehension using Case-based Reasoning

    Dung Ngoc Thai, Dhruv Agarwal, Mudit Chaudhary, Rajarshi Das, M. Zaheer, J. Lee, Hannaneh Hajishirzi, A. McCallumEMNLP2023 We present an accurate and interpretable method for answer extraction in machine reading comprehension that is reminiscent of case-based reasoning (CBR) from classical AI. Our method (CBR-MRC) builds upon the hypothesis that contextualized answers to similar…
  • Measuring and Narrowing the Compositionality Gap in Language Models

    Ofir Press, Muru Zhang, Sewon Min, Ludwig Schmidt, Noah A. Smith, Mike LewisEMNLP Findings2023 We investigate the ability of language models to perform compositional reasoning tasks where the overall solution depends on correctly composing the answers to sub-problems. We measure how often models can correctly answer all sub-problems but not generate…
  • SHARCS: Efficient Transformers through Routing with Dynamic Width Sub-networks

    Mohammadreza Salehi, Sachin Mehta, Aditya Kusupati, Ali Farhadi, Hannaneh HajishirziEMNLP2023 We introduce SHARCS for adaptive inference that takes into account the hardness of input samples. SHARCS can train a router on any transformer network, enabling the model to direct different samples to sub-networks with varying widths. Our experiments…
  • TaskWeb: Selecting Better Source Tasks for Multi-task NLP

    Joongwon Kim, Akari Asai, Gabriel Ilharco, Hannaneh HajishirziEMNLP2023 Recent work in NLP has shown promising results in training models on large amounts of tasks to achieve better generalization. However, it is not well-understood how tasks are related, and how helpful training tasks can be chosen for a new task. In this work…
  • Vera: A General-Purpose Plausibility Estimation Model for Commonsense Statements

    Jiacheng Liu, Wenya Wang, Dianzhuo Wang, Noah A. Smith, Yejin Choi, Hanna HajishirziEMNLP2023 Despite the much discussed capabilities of today's language models, they are still prone to silly and unexpected commonsense failures. We consider a retrospective verification approach that reflects on the correctness of LM outputs, and introduce Vera, a…
  • We're Afraid Language Models Aren't Modeling Ambiguity

    Alisa Liu, Zhaofeng Wu, Julian Michael, Alane Suhr, Peter West, Alexander Koller, Swabha Swayamdipta, Noah A. Smith, Yejin ChoiEMNLP2023 Ambiguity is an intrinsic feature of natural language. Managing ambiguity is a key part of human language understanding, allowing us to anticipate misunderstanding as communicators and revise our interpretations as listeners. As language models (LMs) are…
  • Measuring and Improving Attentiveness to Partial Inputs with Counterfactuals

    Yanai Elazar, Bhargavi Paranjape, Hao Peng, Sarah Wiegreffe, Khyathi Raghavi, Vivek Srikumar, Sameer Singh, Noah A. SmitharXiv2023 The inevitable appearance of spurious correlations in training datasets hurts the generalization of NLP models on unseen data. Previous work has found that datasets with paired inputs are prone to correlations between a specific part of the input (e.g., the…
  • The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback

    Nathan Lambert, Roberto CalandraarXiv2023 Reinforcement learning from human feedback (RLHF) has emerged as a powerful technique to make large language models (LLMs) easier to prompt and more capable in complex settings. RLHF at its core is providing a new toolkit to optimize LLMs other than next…
  • Entangled Preferences: The History and Risks of Reinforcement Learning and Human Feedback

    Nathan Lambert, Thomas Krendl Gilbert, Tom ZickarXiv2023 Reinforcement learning from human feedback (RLHF) has emerged as a powerful technique to make large language models (LLMs) easier to use and more effective. A core piece of the RLHF process is the training and utilization of a model of human preferences that…
  • A taxonomy and review of generalization research in NLP

    D. Hupkes, Mario Giulianelli, Verna Dankers, Mikel Artetxe, Yanai Elazar, Tiago Pimentel, Christos Christodoulopoulos, Karim Lasri, Naomi Saphra, Arabella J. Sinclair, Dennis Ulmer, Florian Schottmann, Khuyagbaatar Batsuren, Kaiser Sun, Koustuv Sinha, Leila Khalatbari, Maria Ryskina, Rita Frieske, Ryan Cotterell, Zhijing JinNature Machine Intelligence2023 The ability to generalise well is one of the primary desiderata of natural language processing (NLP). Yet, what ‘good generalisation’ entails and how it should be evaluated is not well understood, nor are there any evaluation standards for generalisation. In…