Skip to main content ->
Ai2

Ai2 Newsletter

August 2025

Top story - A new, privacy-preserving paradigm for model training

Training powerful language models depends on diverse, high-quality data. But while there's growing momentum to build models with transparent training recipes, data remains the biggest bottleneck.

That's why we developed FlexOlmo, a new paradigm for language model training that enables co-development of AI through data collaboration. Using FlexOlmo, data owners don't need to share raw data directly and can decide when their data is active in the model.

Watch our explainer video here.

"We're introducing a new way to train [AI] in a more collective, collaborative way," Sewon Min, research lead on FlexOlmo, says. Min created FlexOlmo with colleagues Weijia Shi, Akshita Bhagia, and Kevin Farhat. "FlexOlmo is a new technical solution that enables data owners to train a model on their own private data and combine it with other models trained by others on their own private data."

The core idea behind FlexOlmo is allowing each data owner to locally branch from a shared public model, add an "expert" trained on their data locally, and contribute this expert module back to the shared model.

Our experiments found that augmenting the public model with expert modules trained on private datasets leads to significantly better performance than the original public model without the expert modules. We also found that the new shared model retains – or even enhances – each expert’s specialized capabilities while benefiting from the diversity of private datasets.

"The resulting model would be better than a model that any individual data owner could have trained on their own," Min says.

FlexOlmo doesn’t just add flexibility to the training process—it keeps data owners in control even after the model has been deployed. As WIRED's Will Knight explains, "Once data is baked into an AI model today, extracting it from that model is a bit like trying to recover the eggs from a finished cake."

FlexOlmo changes that. The result is a way to have your cake—and get your eggs back, too.

“You could just opt out of the system without any major damage at inference time,” Ai2 CEO Ali Farhadi says. “It’s a whole new way of thinking about how to train these models.”

FlexOlmo could accelerate AI adoption in fields like healthcare, government, and academia, where organizations typically hold sensitive data that can’t be openly shared due to privacy or security considerations.

Data remains one of the most crucial – if not the most critical – ingredients in building capable language models. This kind of collaboration enables a future where data contributors and model developers can work together without compromising on the things they value.

We’re accepting select partners to be the first to build with the future of secure, transparent, and truly open AI. If you’re an organization with sensitive data but want to utilize state-of-art models, connect with our partnership team here.

Ai2 attends ICML 2025

Mid-July was ICML in Vancouver, and a number of our researchers participated. They presented papers, led workshops, and gave talks on a range of timely topics. Click through to see the full list of engagements, and stay tuned to our social channels for additional conference updates.

Open models rejoin the Playground

The Cirrascale API platform is now hosting several open models on the Ai2 Playground: Our OLMo and Molmo models, as well as our open-weight Tülu models. OLMo delivers language understanding, while Molmo can interpret images and text. Tülu are Ai2’s open instruction-following models.

Autonomous scientific discovery

AutoDS is our new open-source system for autonomous, open-ended scientific discovery in data—guided by language model surprisal. AutoDS can identify findings that human experts find unexpected, demonstrating real potential as a research assistant for accelerating scientific insight.

Cirrascale hosts Ai2 models

Cirrascale also recently began offering developer access to our OLMo, Molmo, and Tülu models. Anyone can instantly use the models on Cirrascale's inference platform—no infrastructure setup required. The rates are competitive, and the APIs can be integrated in a range of applications for research and experimentation.

    Ai2 Newsletter Archive