Skip to main content ->
Ai2

Research - Papers

Explore a selection of our published work on a variety of key research challenges in AI.

Filter papers

TimeArena: Shaping Efficient Multitasking Language Agents in a Time-Aware Simulation

Yikai ZhangSiyu YuanCaiyu HuJiangjie Chen
2024
ACL 2024

Despite remarkable advancements in emulating human-like behavior through Large Language Models (LLMs), current textual simulations do not adequately address the notion of time. To this end, we… 

OLMo: Accelerating the Science of Language Models

Dirk GroeneveldIz BeltagyPete WalshHanna Hajishirzi
2024
ACL 2024

Language models (LMs) have become ubiquitous in both NLP research and in commercial product offerings. As their commercial importance has surged, the most powerful models have become closed off,… 

Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

Luca SoldainiRodney KinneyAkshita BhagiaKyle Lo
2024
ACL 2024

Information about pretraining corpora used to train the current best-performing language models is seldom discussed: commercial models rarely detail their data, and even open models are often… 

Catwalk: A Unified Language Model Evaluation Framework for Many Datasets

Dirk GroeneveldAnas AwadallaIz BeltagyJesse Dodge
2023
arXiv.org

The success of large language models has shifted the evaluation paradigms in natural language processing (NLP). The community's interest has drifted towards comparing NLP models across many tasks,… 

IfQA: A Dataset for Open-domain Question Answering under Counterfactual Presuppositions

Wenhao YuMeng JiangPeter ClarkAshish Sabharwal
2023
EMNLP

Although counterfactual reasoning is a fundamental aspect of intelligence, the lack of large-scale counterfactual open-domain question-answering (QA) benchmarks makes it difficult to evaluate and… 

Self-Refine: Iterative Refinement with Self-Feedback

Aman MadaanNiket TandonPrakhar GuptaPeter Clark
2023
NeurIPS

Like humans, large language models (LLMs) do not always generate the best output on their first try. Motivated by how humans refine their written text, we introduce Self-Refine, an approach for… 

A Logic for Expressing Log-Precision Transformers

William MerrillAshish Sabharwal
2023
NeurIPS

One way to interpret the reasoning power of transformer-based language models is to describe the types of logical rules they can resolve over some input text. Recently, Chiang et al. (2023) showed… 

How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources

Yizhong WangHamish IvisonPradeep DasigiHanna Hajishirzi
2023
NeurIPS

In this work we explore recent advances in instruction-tuning language models on a range of open instruction-following datasets. Despite recent claims that open models can be on par with… 

Editing Common Sense in Transformers

Anshita Gupta*Debanjan Mondal*Akshay Krishna Sheshadri*Niket Tandon*
2023
EMNLP

Editing model parameters directly in Transformers makes updating open-source transformer-based models possible without re-training. However, these editing methods have only been evaluated on… 

Increasing Probability Mass on Answer Choices Does Not Always Improve Accuracy

Sarah WiegreffeMatthew FinlaysonOyvind TafjordAshish Sabharwal
2023
EMNLP

When pretrained language models (LMs) are applied to discriminative tasks such as multiple-choice questions, they place probability mass on vocabulary tokens that aren't among the given answer…