Introducing FlexOlmo: a new paradigm for language model training and data collaboration
July 9, 2025
Ai2
Today we’re introducing FlexOlmo, a new paradigm for language model training that enables co-development of AI through data collaboration.
With FlexOlmo, data owners can contribute to the development of a language model without giving up control of their data. There’s no need to share raw data directly, and data contributors can decide when their data is active in the model (i.e., who can make use of it and when), deactivate data at any time, and receive attributions whenever data is used for inference.
Why Do We Need Data Collaboration?
Training powerful language models depends on diverse, high-quality data—often described as the “fuel” of modern AI. While there is growing momentum to build models with transparent training recipes, data remains the biggest bottleneck.
Data owners are frequently hesitant or unable to use their data for model training, even when interested, because of the limitations associated with traditional AI development:
- No flexibility: Standard training pipelines require a one-time, irreversible inclusion of data in a centralized dataset, and data owners cannot opt in or out dynamically after the initial decision is made.
- Loss of control: Once data has been shared publicly, data owners cannot control access to the published data or monitor downstream uses.
- Loss of value: Data is a valuable asset and data owners lose the ability to protect their data after it is shared publicly.
- Lack of attribution: Data owners are not credited in traditional model development, and contributing data to a single dataset makes appropriate attribution impossible.
FlexOlmo is designed to address these concerns, enabling a new collaborative AI development between data owners. The core idea is to allow each data owner to locally branch from a shared public model, add an expert trained on their data locally, and contribute this expert module back to the shared model.
FlexOlmo enables data owners to contribute to the shared model without directly sharing their raw data. Additionally, data contributors retain control over when their data contributions are active in the model and can deactivate them at any time.
FlexOlmo is related to cross-silo federated learning and can be applied to applications that need cross-silo federated learning. However, it differs fundamentally by allowing each data owner to train locally in complete isolation and asynchronously, with the flexibility to opt in or out at any time—leading to significant technical and logistical differences.
FlexOlmo could accelerate AI adoption in healthcare, where organizations often possess data that are not easily shared due to IP concerns or readiness. It also has use cases in government and the public sector, where agencies typically hold sensitive data that can’t be openly shared due to privacy and security considerations. We also expect organizations in academia and financial services to find FlexOlmo useful for a range of projects involving sensitive datasets spanning teams and entire institutions.
FlexOlmo: LMs with Flexible Data Use
FlexOlmo comes with a new training algorithm that drives asynchronous, distributed training on locally maintained datasets while enabling flexible opt-in and opt-out during inference. FlexOlmo uses a mixture-of-experts (MoE) architecture [1, 2, 3]—each expert is trained independently on private datasets and later integrated into an MoE. This design lets data owners contribute asynchronously without sharing their private data, while also allowing continual updates with new data and providing strong guarantees for data opt-out.
Our approach can be viewed as a form of model merging—combining multiple models into a single unified model [4, 5, 6]. However, FlexOlmo is specifically designed to address the unique challenges of our setting: merging models that have been pre-trained on entirely disjointed datasets with differing distributions. We show that prior model merging techniques struggle in this scenario, whereas FlexOlmo is able to handle it effectively.
The key innovation we propose is training experts independently while still teaching them to coordinate. Instead of joint training, each data owner trains their expert alongside a frozen copy of the public model, which serves as an "anchor" to ensure all experts can work together later. The router uses domain-informed embeddings initialized from document embeddings of each dataset, removing the need for joint router training. This design allows data owners to contribute asynchronously without sharing data while maintaining the ability to flexibly include or exclude experts during inference.
Experiments
FlexOlmo looks cool! But how do we know that this model will actually function well?
Our experiments found that augmenting the public model with expert modules trained on private datasets leads to significantly better performance than the original public model without the expert modules. We also found that the new shared model retains – or even enhances – each expert’s specialized capabilities while benefiting from the diversity of private datasets.
According to our experimental results, FlexOlmo achieves performance very close to a hypothetical model trained on all combined public and private data.
But if I share the modules trained on my data, could someone replicate my data?
This is a valid concern. We empirically assess this risk by implementing training data extraction attacks and apply this to an expert module trained on sample math data for three epochs—a setting chosen to be reasonably overtrained. Our analysis found a low extraction rate of 0.7%. For comparison, a model overfitted on a small math subset for 100 epochs yielded a 60% extraction rate, confirming that the attack method itself is strong.
These results suggest two key takeaways:
- First, in practice, extracting a meaningful amount of training data from FlexOlmo is difficult, consistent with prior work.
- Second, if data owners are concerned about even a small risk of data recovery, they can opt to train their expert modules using differentially private (DP) learning methods, which provides formal guarantees. Applying DP is orthogonal to our architecture, and different data owners can make independent decisions on whether to apply DP or not.
Conclusion
FlexOlmo opens the door to a new paradigm of collaborative AI development. Data owners who want to contribute to the open, shared language model ecosystem but are hesitant to share raw data or commit permanently can now participate on their own terms.
Data remains one of the most crucial – if not the most critical – ingredients in building capable language models. This kind of collaboration enables a future where data contributors and model developers can work together without compromising on the things they value.
We’re accepting select partners to be the first to build with the future of secure, transparent and truly open AI. If you’re an organization with sensitive data but want to utilize state-of-the-art models, connect with our partnership team here.
Read our paper to learn more.