Papers
See AI2's Award Winning Papers
Learn more about AI2's Lasting Impact Award
Viewing 21-30 of 203 papers
IfQA: A Dataset for Open-domain Question Answering under Counterfactual Presuppositions
Wenhao Yu, Meng Jiang, Peter Clark, Ashish SabharwalEMNLP • 2023 Although counterfactual reasoning is a fundamental aspect of intelligence, the lack of large-scale counterfactual open-domain question-answering (QA) benchmarks makes it difficult to evaluate and improve models on this ability. To address this void, we…Improving Language Models via Plug-and-Play Retrieval Feedback
Wenhao Yu, Zhihan Zhang, Zhenwen Liang, Meng Jiang, Ashish SabharwalarXiv • 2023 Large language models (LLMs) exhibit remarkable performance across various NLP tasks. However, they often generate incorrect or hallucinated information, which hinders their practical applicability in real-world scenarios. Human feedback has been shown to…Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback
Yao Fu, Hao Peng, Tushar Khot, Mirella LapataarXiv.org • 2023 We study whether multiple large language models (LLMs) can autonomously improve each other in a negotiation game by playing, reflecting, and criticizing. We are interested in this question because if LLMs were able to improve each other, it would imply the…Can AI language models replace human participants?
Danica Dillion, Niket Tandon, Yuling Gu, Kurt GrayTrends in Cognitive Sciences • 2023 Recent work suggests that language models such as GPT can make human-like judgments across a number of domains. We explore whether and when language models might replace human participants in psychological science. We review nascent research, provide a…Complexity-Based Prompting for Multi-Step Reasoning
Yao Fu, Hao Peng, Ashish Sabharwal, Peter Clark, Tushar KhotICLR • 2023 We study the task of prompting large-scale language models to perform multi-step reasoning. Existing work shows that when prompted with a chain of thoughts (CoT), sequences of short sentences describing intermediate reasoning steps towards a final answer…Decomposed Prompting: A Modular Approach for Solving Complex Tasks
Tushar Khot, Harsh Trivedi, Matthew Finlayson, Yao Fu, Kyle Richardson, Peter Clark, Ashish SabharwalICLR • 2023 Few-shot prompting is a surprisingly powerful way to use Large Language Models (LLMs) to solve various tasks. However, this approach struggles as the task complexity increases or when the individual reasoning steps of the task themselves are hard to learn…Toxicity in ChatGPT: Analyzing Persona-assigned Language Models
A. Deshpande, Vishvak Murahari, Tanmay Rajpurohit, A. Kalyan, Karthik NarasimhanarXiv.org • 2023 Large language models (LLMs) have shown incredible capabilities and transcended the natural language processing (NLP) community, with adoption throughout many services like healthcare, therapy, education, and customer service. Since users include people with…The Parallelism Tradeoff: Limitations of Log-Precision Transformers
William Merrill, Ashish SabharwalTACL • ACL • 2023 Abstract Despite their omnipresence in modern NLP, characterizing the computational power of transformer neural nets remains an interesting open question. We prove that transformers whose arithmetic precision is logarithmic in the number of input tokens (and…Specializing Smaller Language Models towards Multi-Step Reasoning
Yao Fu, Hao Peng, Litu Ou, Ashish Sabharwal, Tushar KhotICML • 2023 The surprising ability of Large Language Models (LLMs) to perform well on complex reasoning with only few-shot chain-of-thought prompts is believed to emerge only in very large-scale models (100+ billion parameters). We show that such abilities can, in fact…Transformers Can Be Expressed In First-Order Logic with Majority
William Merrill, Ashish SabharwalarXiv • 2023 Characterizing the implicit structure of the computation within neural networks is a foundational problem in the area of deep learning interpretability. Can the inner decision process of neural networks be captured symbolically in some familiar logic? We show…