Skip to main content ->
Ai2

Language models - OLMo 2

OLMo 2 is a family of fully-open language models, developed start-to-finish with open and accessible training data, open-source training code, reproducible training recipes, transparent evaluations, intermediate checkpoints, and more.

What OLMo 2 provides

for researchers and developers

Models

Explore the collection of fully-open OLMo 2 models, including both pretrained as well as instruction-tuned variants.

Data

Download and explore the underlying training data—the often-hidden secret sauce behind model capabilities that we make freely available to support open scientific research–-used across all stages, including pre-training, mid-training and post-training.

Training

Use and extend our high-performance training code for OLMo 2, which we rely on internally for high-stakes language model training and experimentation.

Evaluation

Inspect the code and data used to produce OLMo 2’s results, which we make openly available for scientific reproduction and scrutiny.

Our philosophy

Early work on pretraining language models considered only a single stage of pretraining on trillions of tokens of unstructured text from massive web crawls. Since then, while more sophisticated approaches have emerged—such as the ideas of mid-training, data curriculum, and the relation between training stability and performance—most successful models offer limited information on how to employ these techniques. By openly sharing our data, recipes, and findings, we hope to provide the open-source community with the resources needed to discover new and innovative approaches to improve model pretraining.

Subscribe to receive monthly updates about the latest Ai2 news.