Papers
See AI2's Award Winning Papers
Learn more about AI2's Lasting Impact Award
Viewing 1-10 of 203 papers
Self-Refine: Iterative Refinement with Self-Feedback
Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, Shashank Gupta, Bodhisattwa Prasad Majumder, K. Hermann, S. Welleck, A. Yazdanbakhsh, Peter ClarkNeurips • 2023 Like humans, large language models (LLMs) do not always generate the best output on their first try. Motivated by how humans refine their written text, we introduce Self-Refine, an approach for improving initial outputs from LLMs through iterative feedback…A Logic for Expressing Log-Precision Transformers
William Merrill, Ashish SabharwalNeurIPS • 2023 One way to interpret the reasoning power of transformer-based language models is to describe the types of logical rules they can resolve over some input text. Recently, Chiang et al. (2023) showed that finite-precision transformers can be equivalently…How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources
Yizhong Wang, Hamish Ivison, Pradeep Dasigi, Jack Hessel, Tushar Khot, Khyathi Raghavi Chandu, David Wadden, Kelsey MacMillan, Noah A. Smith, Iz Beltagy, Hanna HajishirziNeurIPS • 2023 In this work we explore recent advances in instruction-tuning language models on a range of open instruction-following datasets. Despite recent claims that open models can be on par with state-of-the-art proprietary models, these claims are often accompanied…Editing Common Sense in Transformers
Anshita Gupta*, Debanjan Mondal*, Akshay Krishna Sheshadri*, Wenlong Zhao, Xiang Lorraine Li*, Sarah Wiegreffe*, Niket Tandon*EMNLP • 2023 Editing model parameters directly in Transformers makes updating open-source transformer-based models possible without re-training. However, these editing methods have only been evaluated on statements about encyclopedic knowledge with a single correct answer…Increasing Probability Mass on Answer Choices Does Not Always Improve Accuracy
Sarah Wiegreffe, Matthew Finlayson, Oyvind Tafjord, Peter Clark, Ashish SabharwalEMNLP • 2023 When pretrained language models (LMs) are applied to discriminative tasks such as multiple-choice questions, they place probability mass on vocabulary tokens that aren't among the given answer choices. Spreading probability mass across multiple surface forms…Language Models with Rationality
Nora Kassner, Oyvind Tafjord, Ashish Sabharwal, Kyle Richardson, Hinrich Schütze, Peter ClarkEMNLP • 2023 While large language models (LLMs) are proficient at question-answering (QA), the dependencies between their answers and other "beliefs" they may have about the world are typically unstated, and may even be in conflict. Our goal is to uncover such…The Expressive Power of Transformers with Chain of Thought
William Merrill, Ashish SabharwalarXiv • 2023 Recent theoretical work has identified surprisingly simple reasoning problems, such as checking if two nodes in a graph are connected or simulating finite-state machines, that are provably unsolvable by standard transformers that answer immediately after…Closing the Curious Case of Neural Text Degeneration
Matthew Finlayson, John Hewitt, Alexander Koller, Swabha Swayamdipta, Ashish SabharwalarXiv • 2023 Despite their ubiquity in language generation, it remains unknown why truncation sampling heuristics like nucleus sampling are so effective. We provide a theoretical explanation for the effectiveness of the truncation sampling by proving that truncation…Exploiting Generalization in Offline Reinforcement Learning via Unseen State Augmentations
Nirbhay Modhe, Qiaozi Gao, A. Kalyan, Dhruv Batra, G. Thattai, G. SukhatmearXiv.org • 2023 Offline reinforcement learning (RL) methods strike a balance between exploration and exploitation by conservative value estimation -- penalizing values of unseen states and actions. Model-free methods penalize values at all unseen actions, while model-based…DISCO: Distilling Phrasal Counterfactuals with Large Language Models
Zeming Chen, Qiyue Gao, Kyle Richardson, Antoine Bosselut, Ashish SabharwalACL • 2023 Recent methods demonstrate that data augmentation using counterfactual knowledge can teach models the causal structure of a task, leading to robust and generalizable models. However, such counterfactual data often has a limited scale and diversity if…