An abstract illustration of swirling shapes, meant to denote a futuristic feeling.

Research - Papers

Explore a selection of our published work on a variety of key research challenges in AI.

A Survey on Data Selection for Language Models

Alon AlbalakYanai ElazarSang Michael XieWilliam Yang Wang

2024

arXiv

A major factor in the recent success of large language models is the use of enormous and ever-growing text datasets for unsupervised pre-training. However, naively training a model on all available…

Calibrating Large Language Models with Sample Consistency

Qing LyuKumar ShridharChaitanya MalaviyaChris Callison-Burch

2024

arXiv

Accurately gauging the confidence level of Large Language Models' (LLMs) predictions is pivotal for their reliable application. However, LLMs are often uncalibrated inherently and elude conventional…

OLMo: Accelerating the Science of Language Models

Dirk GroeneveldIz BeltagyPete WalshHanna Hajishirzi

2024

ACL 2024

Language models (LMs) have become ubiquitous in both NLP research and in commercial product offerings. As their commercial importance has surged, the most powerful models have become closed off,…

Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

Luca SoldainiRodney KinneyAkshita BhagiaKyle Lo

2024

ACL 2024

Information about pretraining corpora used to train the current best-performing language models is seldom discussed: commercial models rarely detail their data, and even open models are often…

Catwalk: A Unified Language Model Evaluation Framework for Many Datasets

Dirk GroeneveldAnas AwadallaIz BeltagyJesse Dodge

2023

arXiv.org

The success of large language models has shifted the evaluation paradigms in natural language processing (NLP). The community's interest has drifted towards comparing NLP models across many tasks,…

Fine-Grained Human Feedback Gives Better Rewards for Language Model Training

Zeqiu WuYushi HuWeijia ShiHanna Hajishirzi

2023

NeurIPS

Language models (LMs) often exhibit undesirable text generation behaviors, including generating false, toxic, or irrelevant outputs. Reinforcement learning from human feedback (RLHF) - where human…

How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources

Yizhong WangHamish IvisonPradeep DasigiHanna Hajishirzi

2023

NeurIPS

In this work we explore recent advances in instruction-tuning language models on a range of open instruction-following datasets. Despite recent claims that open models can be on par with…

RealTime QA: What's the Answer Right Now?

Jungo KasaiKeisuke SakaguchiYoichi TakahashiKentaro Inui

2023

NeurIPS

We introduce R EAL T IME QA, a dynamic question answering (QA) platform that announces questions and evaluates systems on a regular basis (weekly in this version). R E AL T IME QA inquires about the…

Crystal: Introspective Reasoners Reinforced with Self-Feedback

Jiacheng LiuRamakanth PasunuruHannaneh HajishirziAsli Celikyilmaz

2023

EMNLP

Extensive work has shown that the performance and interpretability of commonsense reasoning can be improved via knowledge-augmented reasoning methods, where the knowledge that underpins the…

Demystifying Prompts in Language Models via Perplexity Estimation

Hila GonenSrini IyerTerra BlevinsLuke Zettlemoyer

2023

EMNLP Findings

Language models can be prompted to perform a wide variety of zero- and few-shot learning problems. However, performance varies significantly with the choice of prompt, and we do not yet understand…

Previous51-60Next