Language models - Tülu3

Tülu 3 is a leading instruction following model family, offering fully open-source data, code, and recipes designed to serve as a comprehensive guide for modern post-training techniques.

Announcing Tülu 3 405B

This newest member of the Tülu family demonstrates the scalability and effectiveness of our Tülu 3 post-training recipe to Llama-405B, achieving competitive or superior performance to both DeepSeek v3 and GPT-4o, while surpassing prior open-weight post-trained models of the same size including Llama 3.1 405B Instruct on many standard benchmarks.

Learn more about Tülu 3 405B

Models

Explore the collection of open-sourced instruct models created from our open data and recipes.

Tülu 3 model family

Data

The underlying training data for fine-tuning processes is the most important piece of the puzzle but often the element with the least transparency. Tülu 3 changes that.

Get the data

Training

We open source our scalable codebase for supervised finetuning (SFT), Direct Preference Optimization (DPO), Reinforcement Learning with Verifiable Rewards (RLVR), and all the other algorithms we considered when training Tülu.

Training code

Evaluation

We're sharing the code base used to produce Tülu 3's results to make these evaluations more standardized and reproducible.

Evaluation suite Decontamination

Paper

Check out the Tülu 3 paper for more insights into the premise and the creation of the Tülu 3 collection.

Read the paper

Blogs

Tülu 3 represents the next era in open post-training. Check out our blog for more on this important new release from Ai2.

Release announcement Technical post

Our philosophy

Early work in language model post-training followed a standard recipe pioneered by models like InstructGPT, consisting of instruction-tuning followed by preference fine-tuning. Since then, the sophistication and complexity of post-training approaches have continued to increase, however most successful post-training models offer limited information about their training data, code, or recipes. Tülu 3 pushes the boundaries of research in post-training and closes the gap between open and closed fine-tuning recipes. By openly sharing our data, recipes, and findings, we hope to uncover which paths for the open-source community will lead to success and which do not, enabling the community to explore new and innovative post-training approaches.

Our approach

The Tülu 3 effort began with identifying key desirable capabilities for generalist language models, including knowledge, reasoning, mathematics, coding, instruction following, general chat, and safety – areas where current open post-training recipes often fall behind.

Our success is rooted in careful data curation, rigorous experimentation, innovative methodologies, and improved training infrastructure. In particular, we produce Tülu 3 models through a four-stage post-training recipe on top of pre-trained language models (namely Llama 3 Base). This includes (1) careful prompt curation and synthesis, (2) supervised finetuning on our carefully selected mix of prompts and their completions targeting core skills, (3) combining both off- and on-policy preference data to apply preference tuning, and (4) a new RL-based method to enhance specific skills with verifiable rewards.

The stages of development of Tülu 3's datasets, training methods, and evaluation suite.

Our results

Tülu 3 models achieve state-of-the-art performance across our multi-skill evaluation compared to models of an equivalent size and some closed API-based models.