Skip to main content ->
Ai2

Language models - Open Language Model: OLMo

A highly performant, truly open LLM and framework intentionally designed with access to the data, training code, models, and evaluation code necessary to advance AI and study language models collectively.

What OLMo provides

for researchers and developers

More transparency

With full insight into the training data behind the model, researchers can work more efficiently and bypass the need to rely on qualitative assumptions of model performance.

Less carbon

By opening the full training and evaluation ecosystem, we can radically reduce developmental redundancies, which is critical in the decarbonization of AI.

Lasting impact

By keeping models and their datasets in the open rather than hidden behind APIs, we enable researchers to learn and build from previous models and work.

Now is the time

for truly open AI research

Data - Dolma

To support the study of the relationship between the data and any model trained on it, we release Dolma, the pretraining dataset powering OLMo. Dolma is an open dataset from a diverse mix of web content, academic publications, code, books, and encyclopedic materials. To date, we have released multiple versions of Dolma improving on the previous version with increasingly diverse and higher quality data. All versions of Dolma are openly available for download from the Hugging Face Hub.

Read the Dolma paper to learn more.

Explore our open-source tools to create and refine Dolma.

Training - OLMo

OLMo is our series of open language models, which includes full model weights, training code, training logs, training metrics in the form of Weights & Biases logs, and inference code. To date, we have released multiple models at the 1B and 7B scales, trained to 2-3 trillion tokens. For all OLMo models, we’ve released all code, weights, and 500+ intermediate checkpoints, each supported by tooling that can be used to trace back to the exact data that was used at that point during training. All OLMo weights and code are released under the Apache 2.0 License and available for download from the Hugging Face Hub.

Read the OLMo paper to learn more.

Adaptation - Tulu

Tulu is a suite of models and datasets for fine-tuning state-of-the-art language models. Drawing on the latest open datasets, Tulu models and recipes aid models with instruction-following, reasoning, and coding abilities. The Tulu suite includes models of many sizes, from 7B to 70B parameters, trained with everything from Direct Preference Optimization (DPO) to Proximal Policy Optimization (PPO). We take the lessons from the Tulu models and add them to the OLMo models to make OLMo Instruct, which are available for download on the Hugging Face Hub.

We perform adaptation on our post-training datasets, including Tulu SFT mixture and our cleaned version of UltraFeedback

Learn more from the original Tulu paper, the Tulu 2 paper, or our latest work unpacking DPO vs PPO

Fine-tune your own models with Open Instruct on GitHub

Evaluation - Paloma

Paloma is a benchmark for evaluating open language models across many different domains (ranging from niche artist communities to Reddit forums on mental health). We have already evaluated several models such as six 1B baseline models that we trained using different popular corpora (such as Dolma) to understand how language model performance varies across 585 different domains. We encourage you to run our standardized inference code on additional models and submit their results to extend our benchmark.

Read the Paloma paper to learn more

Explore the evaluation data on Hugging Face

View the evaluation source code on GitHub

Evaluation - OLMES

OLMES is a standard for reproducible language model evaluations that is open, practical, completely documented, and can be applied to current leaderboards and evaluation code bases. We identify and review the varying factors in evaluation practices adopted by the community and provide recommendations guided by results from existing literature and new experiments investigating open questions. OLMES is designed to facilitate robust comparisons of model performances, both during model development and when comparing final powerful models, and can be used across a range of model sizes e.g., from 1B to 70B.

Read the OLMES paper to learn more

View the evaluation source code on GitHub

Get in touch

For questions or feedback, you can reach us at olmo@allenai.org or open an issue on GitHub.

This work was made possible by our partners

AMD, CSC, ICT Solutions for Brilliant Minds, Databricks, Kempner Institute for the Study of Natural & Artificial Intelligence and, University of Washington. Additional thanks to EleutherAI, Meta, Stanford CRFM, Together AI and Hugging Face.