Skip to main content ->
Ai2

Research - Papers

Explore a selection of our published work on a variety of key research challenges in AI.

Filter papers

OLMo: Accelerating the Science of Language Models

Dirk GroeneveldIz BeltagyPete WalshHanna Hajishirzi
2024
ACL 2024

Language models (LMs) have become ubiquitous in both NLP research and in commercial product offerings. As their commercial importance has surged, the most powerful models have become closed off,… 

Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

Luca SoldainiRodney KinneyAkshita BhagiaKyle Lo
2024
ACL 2024

Information about pretraining corpora used to train the current best-performing language models is seldom discussed: commercial models rarely detail their data, and even open models are often… 

Catwalk: A Unified Language Model Evaluation Framework for Many Datasets

Dirk GroeneveldAnas AwadallaIz BeltagyJesse Dodge
2023
arXiv.org

The success of large language models has shifted the evaluation paradigms in natural language processing (NLP). The community's interest has drifted towards comparing NLP models across many tasks,… 

Fine-Grained Human Feedback Gives Better Rewards for Language Model Training

Zeqiu WuYushi HuWeijia ShiHanna Hajishirzi
2023
NeurIPS

Language models (LMs) often exhibit undesirable text generation behaviors, including generating false, toxic, or irrelevant outputs. Reinforcement learning from human feedback (RLHF) - where human… 

How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources

Yizhong WangHamish IvisonPradeep DasigiHanna Hajishirzi
2023
NeurIPS

In this work we explore recent advances in instruction-tuning language models on a range of open instruction-following datasets. Despite recent claims that open models can be on par with… 

RealTime QA: What's the Answer Right Now?

Jungo KasaiKeisuke SakaguchiYoichi TakahashiKentaro Inui
2023
NeurIPS

We introduce R EAL T IME QA, a dynamic question answering (QA) platform that announces questions and evaluates systems on a regular basis (weekly in this version). R E AL T IME QA inquires about the… 

Crystal: Introspective Reasoners Reinforced with Self-Feedback

Jiacheng LiuRamakanth PasunuruHannaneh HajishirziAsli Celikyilmaz
2023
EMNLP

Extensive work has shown that the performance and interpretability of commonsense reasoning can be improved via knowledge-augmented reasoning methods, where the knowledge that underpins the… 

Demystifying Prompts in Language Models via Perplexity Estimation

Hila GonenSrini IyerTerra BlevinsLuke Zettlemoyer
2023
EMNLP Findings

Language models can be prompted to perform a wide variety of zero- and few-shot learning problems. However, performance varies significantly with the choice of prompt, and we do not yet understand… 

Do All Languages Cost the Same? Tokenization in the Era of Commercial Language Models

Orevaoghene AhiaSachin KumarHila GonenYulia Tsvetkov
2023
EMNLP

Language models have graduated from being research prototypes to commercialized products offered as web APIs, and recent works have highlighted the multilingual capabilities of these products. The… 

FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation

Sewon MinKalpesh KrishnaXinxi LyuHannaneh Hajishirzi
2023
EMNLP

Evaluating the factuality of long-form text generated by large language models (LMs) is non-trivial because (1) generations often contain a mixture of supported and unsupported pieces of…