Language models - Open Language Model: OLMo
A highly performant, truly open LLM and framework intentionally designed with access to the data, training code, models, and evaluation code necessary to advance AI and study language models collectively.
What OLMo provides
for researchers and developers
More transparency
With full insight into the training data behind the model, researchers can work more efficiently and bypass the need to rely on qualitative assumptions of model performance.
Less carbon
By opening the full training and evaluation ecosystem, we can radically reduce developmental redundancies, which is critical in the decarbonization of AI.
Lasting impact
By keeping models and their datasets in the open rather than hidden behind APIs, we enable researchers to learn and build from previous models and work.
Now is the time
for truly open AI research
Data - Dolma
To support the study of the relationship between the data and any model trained on it, we release Dolma, the pretraining dataset powering OLMo. Dolma is an open dataset from a diverse mix of web content, academic publications, code, books, and encyclopedic materials. To date, we have released multiple versions of Dolma improving on the previous version with increasingly diverse and higher quality data. All versions of Dolma are openly available for download from the Hugging Face Hub.
Read the Dolma paper to learn more.
Explore our open-source tools to create and refine Dolma.
Training - OLMo
OLMo is our series of open language models, which includes full model weights, training code, training logs, training metrics in the form of Weights & Biases logs, and inference code. To date, we have released multiple models at the 1B and 7B scales, trained to 2-3 trillion tokens. For all OLMo models, we’ve released all code, weights, and 500+ intermediate checkpoints, each supported by tooling that can be used to trace back to the exact data that was used at that point during training. All OLMo weights and code are released under the Apache 2.0 License and available for download from the Hugging Face Hub.
Read the OLMo paper to learn more.
Adaptation - Tulu
Tulu is a suite of models and datasets for fine-tuning state-of-the-art language models. Drawing on the latest open datasets, Tulu models and recipes aid models with instruction-following, reasoning, and coding abilities. The Tulu suite includes models of many sizes, from 7B to 70B parameters, trained with everything from Direct Preference Optimization (DPO) to Proximal Policy Optimization (PPO). We take the lessons from the Tulu models and add them to the OLMo models to make OLMo Instruct, which are available for download on the Hugging Face Hub.
We perform adaptation on our post-training datasets, including Tulu SFT mixture and our cleaned version of UltraFeedback
Learn more from the original Tulu paper, the Tulu 2 paper, or our latest work unpacking DPO vs PPO
Fine-tune your own models with Open Instruct on GitHub
Evaluation - Paloma
Paloma is a benchmark for evaluating open language models across many different domains (ranging from niche artist communities to Reddit forums on mental health). We have already evaluated several models such as six 1B baseline models that we trained using different popular corpora (such as Dolma) to understand how language model performance varies across 585 different domains. We encourage you to run our standardized inference code on additional models and submit their results to extend our benchmark.
Read the Paloma paper to learn more
Evaluation - OLMES
OLMES is a standard for reproducible language model evaluations that is open, practical, completely documented, and can be applied to current leaderboards and evaluation code bases. We identify and review the varying factors in evaluation practices adopted by the community and provide recommendations guided by results from existing literature and new experiments investigating open questions. OLMES is designed to facilitate robust comparisons of model performances, both during model development and when comparing final powerful models, and can be used across a range of model sizes e.g., from 1B to 70B.
Get in touch
For questions or feedback, you can reach us at olmo@allenai.org or open an issue on GitHub.
This work was made possible by our partners
AMD, CSC, ICT Solutions for Brilliant Minds, Databricks, Kempner Institute for the Study of Natural & Artificial Intelligence and, University of Washington. Additional thanks to EleutherAI, Meta, Stanford CRFM, Together AI and Hugging Face.