OLMo Release notes
OLMo 2 November 2024
Release date: November 26, 2024
What's new
OLMo 2 introduces a new family of 7B and 13B models trained on up to 5T tokens, representing the best fully-open language models to date. These models sit at the Pareto frontier of performance and training efficiency, with OLMo 2 7B outperforming Llama-3.1 8B and OLMo 2 13B outperforming Qwen 2.5 7B despite lower total training FLOPs. Check out the artifacts linked below, and read the blog.
Key improvements include:
• Enhanced architecture with RMSNorm, QK-Norm, auxiliary Z-loss, and rotary positional embeddings
• Two-stage curriculum training approach using OLMo-Mix-1124 and Dolmino-Mix-1124, model souping for final checkpoints.
• State-of-the-art post-training methodology from Tülu 3
• Evaluated on the OLMES suite
• The Instruct variants are competitive with the best open-weight models, with OLMo 2 13B Instruct outperforming Qwen 2.5 14B instruct, Tülu 3 8B, and Llama 3.1 8B instruct models.
Artifacts:
• Demo
• Pretraining dataset stage 1: OLMo-mix-1124
• Pretraining dataset stage 2: Dolmino-mix-1124
• Post-training dataset: Tülu 3 SFT Mix
• Preference data for OLMo 2 7B
• Preference data for OLMo 2 13B
• RLVR mix
OLMoE September 2024
Release date: September 3, 2024
What's new
• OLMoE is the first good Mixture-of-Experts LLM that is 100% open-source. The model has 1B active parameters, and 7B total parameters and is trained for a total of 5T tokens. Performance-wise, OLMoE is the state of the art among models with a similar cost of 1B parameters. It even beats a number of larger models on common benchmarks like MMLU or AlpacaEval, such as Gemma2, Llama2 13B Chat, OLMo-7B-0724 and DeepSeekMoE 16B. Check out the main links below and read the blog announcement for more details.
Main links:
• OLMoE finetunining dataset: tulu-v3.1-mix-preview-4096-OLMoE
• OLMoE finetunining dataset: ultrafeedback_binarized_cleaned
OLMo July 2024 (1B and 7B)
Release date: July 31, 2024
📌A quick note on naming: We have opted to update the OLMo naming convention to adhere to the following format: model name, model version (whole numbers only), model parameters, followed by the month and year of the release. This naming structure will make updates easier to track over time and is scalable so we can have infinite OLMOs! As an example, OLMo v1.7 is now OLMo April 2024, and today’s released models adhere to this updated naming convention.
What's new
• Improvements: OLMo 1B July 2024 shows 4.4 point increase in HellaSwag among other evaluation improvements from an improved version of the Dolma dataset and staged training. OLMo 7B July 2024 also leverages the newest version of the Dolma dataset and is trained with a two-staged curriculum. The second stage consistently adds 2 - 3 points of performance improvements. The OLMo July 2024 SFT and Instruct models use the Tulu 2 recipe with OLMo 7B July 2024 and are generally more capable than OLMo 7B April 2024 and the original OLMo 7B.
Main links:
OLMo 7B April 2024 (formerly known as OLMo 7B 1.7)
Release date: April 17, 2024
What's new
• Improvements: OLMo 7B April 2024 (previously known as OLMo 1.7–7B) has a longer context length, up from 2048 to 4096 tokens and is trained on the new Dolma 1.7 dataset. Thanks to the improved Dolma dataset, this model scores 52 on MMLU, sitting above Llama 2–7B and approaching Llama 2–13B, and outperforms Llama 2–13B on GSM8K.
Main links:
OLMo February 2024
Release date: February 1, 2024
What's new
Announcing OLMo, Ai2’s first Open Language Model (OLMo). The AI2 LLM framework is intentionally designed to provide access to data, training code, models, and evaluation code necessary to advance AI through open research to empower academics and researchers to study the science of language models collectively. This first batch of OLMo models includes four variants of our language model at the 7B scale corresponding to different architectures, optimizers, and training hardware, as well as one model at the 1B scale. All variants are trained on at least 2T tokens.
Main links:
• OLMo 1B
• OLMo 7B