Papers

Learn more about AI2's Lasting Impact Award
Viewing 11-20 of 196 papers
  • Attentiveness to Answer Choices Doesn't Always Entail High QA Accuracy

    Sarah Wiegreffe, Matthew Finlayson, Oyvind Tafjord, Peter Clark, Ashish SabharwalarXiv2023 When large language models (LMs) are applied in zero- or few-shot settings to discriminative tasks such as multiple-choice questions, their attentiveness (i.e., probability mass) is spread across many vocabulary tokens that are not valid choices. Such a…
  • CSTS: Conditional Semantic Textual Similarity

    A. Deshpande, Carlos E. Jimenez, Howard Chen, Vishvak Murahari, Victoria Graf, Tanmay Rajpurohit, A. Kalyan, Danqi Chen, Karthik NarasimhanarXiv.org2023 Semantic textual similarity (STS) has been a cornerstone task in NLP that measures the degree of similarity between a pair of sentences, with applications in information retrieval, question answering, and embedding methods. However, it is an inherently…
  • Editing Commonsense Knowledge in GPT

    Anshita Gupta, Debanjan Mondal, Akshay Krishna Sheshadri, Wenlong Zhao, Xiang Lorraine Li, Sarah Wiegreffe, Niket TandonarXiv.org2023 Memory editing methods for updating encyclopedic knowledge in transformers have received increasing attention for their efficacy, specificity, and generalization advantages. However, it remains unclear if such methods can be adapted for the more nuanced…
  • IfQA: A Dataset for Open-domain Question Answering under Counterfactual Presuppositions

    Wenhao Yu, Meng Jiang, Peter Clark, Ashish SabharwalarXiv2023 Although counterfactual reasoning is a fundamental aspect of intelligence, the lack of large-scale counterfactual open-domain question-answering (QA) benchmarks makes it difficult to evaluate and improve models on this ability. To address this void, we…
  • Improving Language Models via Plug-and-Play Retrieval Feedback

    Wenhao Yu, Zhihan Zhang, Zhenwen Liang, Meng Jiang, Ashish SabharwalarXiv2023 Large language models (LLMs) exhibit remarkable performance across various NLP tasks. However, they often generate incorrect or hallucinated information, which hinders their practical applicability in real-world scenarios. Human feedback has been shown to…
  • Language Models with Rationality

    Nora Kassner, Oyvind Tafjord, Ashish Sabharwal, Kyle Richardson, Hinrich Schütze, Peter ClarkarXiv2023 While large language models (LLMs) are proficient at question-answering (QA), the dependencies between their answers and other"beliefs"they may have about the world are typically unstated, and may even be in conflict. Our goal is to uncover such dependencies…
  • Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback

    Yao Fu, Hao Peng, Tushar Khot, Mirella LapataarXiv.org2023 We study whether multiple large language models (LLMs) can autonomously improve each other in a negotiation game by playing, reflecting, and criticizing. We are interested in this question because if LLMs were able to improve each other, it would imply the…
  • Complexity-Based Prompting for Multi-Step Reasoning

    Yao Fu, Hao Peng, Ashish Sabharwal, Peter Clark, Tushar KhotICLR2023 We study the task of prompting large-scale language models to perform multi-step reasoning. Existing work shows that when prompted with a chain of thoughts (CoT), sequences of short sentences describing intermediate reasoning steps towards a final answer…
  • Decomposed Prompting: A Modular Approach for Solving Complex Tasks

    Tushar Khot, Harsh Trivedi, Matthew Finlayson, Yao Fu, Kyle Richardson, Peter Clark, Ashish SabharwalICLR2023 Few-shot prompting is a surprisingly powerful way to use Large Language Models (LLMs) to solve various tasks. However, this approach struggles as the task complexity increases or when the individual reasoning steps of the task themselves are hard to learn…
  • Toxicity in ChatGPT: Analyzing Persona-assigned Language Models

    A. Deshpande, Vishvak Murahari, Tanmay Rajpurohit, A. Kalyan, Karthik NarasimhanarXiv.org2023 Large language models (LLMs) have shown incredible capabilities and transcended the natural language processing (NLP) community, with adoption throughout many services like healthcare, therapy, education, and customer service. Since users include people with…