Skip to main content ->
Ai2

OLMo Release notes

OLMo 2 November 2024

Release date: November 26, 2024

What's new

OLMo 2 introduces a new family of 7B and 13B models trained on up to 5T tokens, representing the best fully-open language models to date. These models sit at the Pareto frontier of performance and training efficiency, with OLMo 2 7B outperforming Llama-3.1 8B and OLMo 2 13B outperforming Qwen 2.5 7B despite lower total training FLOPs. Check out the artifacts linked below, and read the blog.

Key improvements include:

• Enhanced architecture with RMSNorm, QK-Norm, auxiliary Z-loss, and rotary positional embeddings

• Two-stage curriculum training approach using OLMo-Mix-1124 and Dolmino-Mix-1124, model souping for final checkpoints.

• State-of-the-art post-training methodology from Tülu 3

• Evaluated on the OLMES suite

• The Instruct variants are competitive with the best open-weight models, with OLMo 2 13B Instruct outperforming Qwen 2.5 14B instruct, Tülu 3 8B, and Llama 3.1 8B instruct models.

Artifacts:

Demo

OLMo-2-1124-7B

OLMo-2-1124-13B

OLMo-2-1124-7B-Instruct

OLMo-2-1124-13B-Instruct

Pretraining dataset stage 1: OLMo-mix-1124

Pretraining dataset stage 2: Dolmino-mix-1124

Post-training dataset: Tülu 3 SFT Mix

Preference data for OLMo 2 7B

Preference data for OLMo 2 13B

RLVR mix

HuggingFace Collection

OLMoE September 2024

Release date: September 3, 2024

What's new

OLMoE is the first good Mixture-of-Experts LLM that is 100% open-source. The model has 1B active parameters, and 7B total parameters and is trained for a total of 5T tokens. Performance-wise, OLMoE is the state of the art among models with a similar cost of 1B parameters. It even beats a number of larger models on common benchmarks like MMLU or AlpacaEval, such as Gemma2, Llama2 13B Chat, OLMo-7B-0724 and DeepSeekMoE 16B. Check out the main links below and read the blog announcement for more details.

Main links:

OLMoE-1B-7B-0924

OLMoE-1B-7B-0927-Instruct

OLMoE code

OLMoE pretraining dataset

OLMoE finetunining dataset: tulu-v3.1-mix-preview-4096-OLMoE

OLMoE finetunining dataset: ultrafeedback_binarized_cleaned

OLMoE finetunining code

OLMoE: Open Mixture-of-Experts Language Models

OLMoE blog announcement

OLMo July 2024 (1B and 7B)

Release date: July 31, 2024

📌A quick note on naming: We have opted to update the OLMo naming convention to adhere to the following format: model name, model version (whole numbers only), model parameters, followed by the month and year of the release. This naming structure will make updates easier to track over time and is scalable so we can have infinite OLMOs! As an example, OLMo v1.7 is now OLMo April 2024, and today’s released models adhere to this updated naming convention.

What's new

• Improvements: OLMo 1B July 2024 shows 4.4 point increase in HellaSwag among other evaluation improvements from an improved version of the Dolma dataset and staged training. OLMo 7B July 2024 also leverages the newest version of the Dolma dataset and is trained with a two-staged curriculum. The second stage consistently adds 2 - 3 points of performance improvements. The OLMo July 2024 SFT and Instruct models use the Tulu 2 recipe with OLMo 7B July 2024 and are generally more capable than OLMo 7B April 2024 and the original OLMo 7B.

Main links:

OLMo 1B July 2024

OLMo 7B July 2024

OLMo 7B July 2024 SFT

OLMo 7B July 2024 Instruct

OLMo 7B April 2024 (formerly known as OLMo 7B 1.7)

Release date: April 17, 2024

What's new

• Improvements: OLMo 7B April 2024 (previously known as OLMo 1.7–7B) has a longer context length, up from 2048 to 4096 tokens and is trained on the new Dolma 1.7 dataset. Thanks to the improved Dolma dataset, this model scores 52 on MMLU, sitting above Llama 2–7B and approaching Llama 2–13B, and outperforms Llama 2–13B on GSM8K.

Main links:

OLMo 7B April 2024

Dolma v1.7

Read our blog announcement here

OLMo February 2024

Release date: February 1, 2024

What's new

Announcing OLMo, Ai2’s first Open Language Model (OLMo). The AI2 LLM framework is intentionally designed to provide access to data, training code, models, and evaluation code necessary to advance AI through open research to empower academics and researchers to study the science of language models collectively. This first batch of OLMo models includes four variants of our language model at the 7B scale corresponding to different architectures, optimizers, and training hardware, as well as one model at the 1B scale. All variants are trained on at least 2T tokens.

Main links:

OLMo 1B

OLMo 7B

OLMo 7B Instruct

OLMo: Accelerating the Science of Language Models

Read our blog announcement here