Research - Papers

Explore a selection of our published work on a variety of key research challenges in AI.

Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement

Jaehun JungFaeze BrahmanYejin Choi

2025

International Conference on Learning Representations

We present a principled approach to provide LLM-based evaluation with a rigorous guarantee of human agreement. We first propose that a reliable evaluation method should not uncritically rely on…

WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild

Bill Yuchen LinYuntian DengK. ChanduYejin Choi

2025

ICLR

We introduce WildBench, an automated evaluation framework designed to benchmark large language models (LLMs) using challenging, real-world user queries. WildBench consists of 1,024 tasks carefully…

OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens

Jiacheng LiuTaylor BlantonYanai ElazarJesse Dodge

2025

ACL 2025 Demo Track

We present OLMoTrace, the first system that traces the outputs of language models back to their full, multi-trillion-token training data in real time. OLMoTrace finds and shows verbatim matches…

OLMoE: Open Mixture-of-Experts Language Models

Niklas MuennighoffLuca SoldainiDirk GroeneveldHanna Hajishirzi

2025

arXiv.org

We introduce OLMoE, a fully open, state-of-the-art language model leveraging sparse Mixture-of-Experts (MoE). OLMoE-1B-7B has 7 billion (B) parameters but uses only 1B per input token. We pretrain…

2 OLMo 2 Furious

Pete WalshLuca SoldainiDirk GroeneveldHanna Hajishirzi

2025

arXiv.org

We present OLMo 2, the next generation of our fully open language models. OLMo 2 includes dense autoregressive models with improved architecture and training recipe, pretraining data mixtures, and…

MAGNET: Improving the Multilingual Fairness of Language Models with Adaptive Gradient-Based Tokenization

Orevaoghene AhiaSachin KumarHila GonenNoah A. Smith

2024

NeurIPS

In multilingual settings, non-Latin scripts and low-resource languages are usually disadvantaged in terms of language models' utility, efficiency, and cost. Specifically, previous studies have…

Paloma: A Benchmark for Evaluating Language Model Fit

Ian MagnussonAkshita BhagiaValentin HofmannJesse Dodge

2024

NeurIPS

Language models (LMs) commonly report perplexity on monolithic data held out from training. Implicitly or explicitly, this data is composed of domains$\unicode{x2013}$varying distributions of…

The Art of Saying No: Contextual Noncompliance in Language Models

Faeze BrahmanSachin KumarVidhisha BalachandranHannaneh Hajishirzi

2024

NeurIPS Datasets & Benchmarks

Chat-based language models are designed to be helpful, yet they should not comply with every user request. While most existing work primarily focuses on refusal of"unsafe"queries, we posit that the…

Tülu 3: Pushing Frontiers in Open Language Model Post-Training

Nathan LambertJacob Daniel MorrisonValentina PyatkinHanna Hajishirzi

2024

arXiv

Language model post-training is applied to refine behaviors and unlock new skills across a wide range of recent language models, but open recipes for applying these techniques lag behind proprietary…

Applying Intrinsic Debiasing on Downstream Tasks: Challenges and Considerations for Machine Translation

Bar IluzYanai ElazarAsaf YehudaiGabriel Stanovsky

2024

EMNLP

Most works on gender bias focus on intrinsic bias -- removing traces of information about a protected group from the model's internal representation. However, these works are often disconnected from…

Previous21-30Next