Skip to main content ->
Ai2

Language models - OLMo 2

OLMo 2 is a family of fully-open language models, developed start-to-finish with open and accessible training data, open-source training code, reproducible training recipes, transparent evaluations, intermediate checkpoints, and more.

Models

Explore the collection of fully-open OLMo 2 models, including both pretrained as well as instruction-tuned variants.

Data

Download and explore the underlying training data—the often-hidden secret sauce behind model capabilities that we make freely available to support open scientific research–-used across all stages, including pre-training, mid-training and post-training.

Training

Use and extend our high-performance training code for OLMo 2, which we rely on internally for high-stakes language model training and experimentation.

Evaluation

Inspect the code and data used to produce OLMo 2’s results, which we make openly available for scientific reproduction and scrutiny.

Blog

Learn more about OLMo 2 – the best fully open language model to date.

Release notes

Get the technical details behind this latest release.

Paper

Check out the paper to dive into the making of OLMo 2.

Our philosophy

Early work on pretraining language models considered only a single stage of pretraining on trillions of tokens of unstructured text from massive web crawls. Since then, while more sophisticated approaches have emerged—such as the ideas of mid-training, data curriculum, and the relation between training stability and performance—most successful models offer limited information on how to employ these techniques. By openly sharing our data, recipes, and findings, we hope to provide the open-source community with the resources needed to discover new and innovative approaches to improve model pretraining.

Subscribe to receive monthly updates about the latest Ai2 news.