An abstract illustration of swirling shapes, meant to denote a futuristic feeling.

Research - Papers

Explore a selection of our published work on a variety of key research challenges in AI.

The Expressive Power of Transformers with Chain of Thought

William MerrillAshish Sabharwal

2024

ICLR

Recent theoretical work has identified surprisingly simple reasoning problems, such as checking if two nodes in a graph are connected or simulating finite-state machines, that are provably…

Closing the Curious Case of Neural Text Degeneration

Matthew FinlaysonJohn HewittAlexander KollerAshish Sabharwal

2024

ICLR

Despite their ubiquity in language generation, it remains unknown why truncation sampling heuristics like nucleus sampling are so effective. We provide a theoretical explanation for the…

Enhancing Systematic Decompositional Natural Language Inference Using Informal Logic

Nathaniel WeirKate SandersOrion WellerBenjamin Van Durme

2024

arXiv.org

Contemporary language models enable new opportunities for structured reasoning with text, such as the construction and evaluation of intuitive, proof-like textual entailment trees without relying on…

Calibrating Large Language Models with Sample Consistency

Qing LyuKumar ShridharChaitanya MalaviyaChris Callison-Burch

2024

arXiv

Accurately gauging the confidence level of Large Language Models' (LLMs) predictions is pivotal for their reliable application. However, LLMs are often uncalibrated inherently and elude conventional…

TimeArena: Shaping Efficient Multitasking Language Agents in a Time-Aware Simulation

Yikai ZhangSiyu YuanCaiyu HuJiangjie Chen

2024

ACL 2024

Despite remarkable advancements in emulating human-like behavior through Large Language Models (LLMs), current textual simulations do not adequately address the notion of time. To this end, we…

OLMo: Accelerating the Science of Language Models

Dirk GroeneveldIz BeltagyPete WalshHanna Hajishirzi

2024

ACL 2024

Language models (LMs) have become ubiquitous in both NLP research and in commercial product offerings. As their commercial importance has surged, the most powerful models have become closed off,…

Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

Luca SoldainiRodney KinneyAkshita BhagiaKyle Lo

2024

ACL 2024

Information about pretraining corpora used to train the current best-performing language models is seldom discussed: commercial models rarely detail their data, and even open models are often…

Catwalk: A Unified Language Model Evaluation Framework for Many Datasets

Dirk GroeneveldAnas AwadallaIz BeltagyJesse Dodge

2023

arXiv.org

The success of large language models has shifted the evaluation paradigms in natural language processing (NLP). The community's interest has drifted towards comparing NLP models across many tasks,…

IfQA: A Dataset for Open-domain Question Answering under Counterfactual Presuppositions

Wenhao YuMeng JiangPeter ClarkAshish Sabharwal

2023

EMNLP

Although counterfactual reasoning is a fundamental aspect of intelligence, the lack of large-scale counterfactual open-domain question-answering (QA) benchmarks makes it difficult to evaluate and…

Self-Refine: Iterative Refinement with Self-Feedback

Aman MadaanNiket TandonPrakhar GuptaPeter Clark

2023

NeurIPS

Like humans, large language models (LLMs) do not always generate the best output on their first try. Motivated by how humans refine their written text, we introduce Self-Refine, an approach for…

Previous51-60Next