Modular LLMs at scale: how FlexOlmo is helping to pool national expertise without pooling sensitive data

July 2, 2026

Ai2

When we released FlexOlmo last year, we wanted to show that a language model doesn't have to be a monolith. Different teams could train their own pieces – specialized modules called experts – in isolation and merge them into a shared model, without ever pooling the data underneath.

A project out of Denmark, Danish Foundation Models (DFM), used FlexOlmo as the cornerstone for an architecture of their own. DFM develops open language models for the Danish language on the premise that models for lower-resource languages will fall behind unless independent efforts step in from outside well-resourced commercial labs. The institutions that would benefit most from Danish-language models – hospitals, universities, public-sector organizations, and smaller companies – often hold data that they can't share, whether for regulatory or proprietary reasons. Yet that data is exactly what's needed to train the models that would serve them.

For DFM, FlexOlmo was the right starting point.

"We envisioned a modular system whereby national initiatives like ours can independently train on their respective corpora, and then bring those independently trained models together," says Jacob Nielsen, an Industrial PhD Fellow at Ordbogen A/S and a researcher at the University of Southern Denmark's (SDU) OdenseNLP lab, a research group at SDU led by Peter Schneider-Kamp and Lukas Galke Poech and part of DFM. “More broadly, we aim to contribute to modular multilingual models that are beneficial on an international scale.”

Nielsen and a team of collaborators at OdenseNLP built FlexMoRE, which preserves FlexOlmo's modular structure but shrinks the model enough to run on consumer hardware. It addresses a limit DFM ran into with FlexOlmo: in the original framework, each expert is the size of a full standalone model, which works when there are only a handful of experts but doesn't scale easily. As more groups contribute their own experts, the combined system grows too large to run on the kinds of machines DFM's partners often have available.

"FlexMoRE significantly reduces FlexOlmo's memory demands while preserving performance across almost all categories, allowing a broader audience to benefit from modular models,” says Nielsen.

What FlexMoRE changes

DFM covers the full stack of Danish-language AI: open training corpora, evaluation infrastructure, and a series of openly licensed Danish language models, all built to comply with the EU AI Act and GDPR. Its end goal is a system where a user can download just the experts they need – a few languages, plus a few domains – and run the combined model on their own hardware.

That vision is what Nielsen's team at OdenseNLP set out to make possible, using FlexOlmo as the core layer.

FlexOlmo is a mixture-of-experts model. Rather than running every token (a small chunk of text, often part of a word) through one large system to generate a response, it routes each to a subset of specialized experts. When the model encounters a token from a legal document, for example, the router might send it to an expert trained on legal text; when it hits code, an expert trained on code. Only the selected experts run at inference time.

What Nielsen and his colleagues changed was the assumption that every expert has to be the same size.

In FlexMoRE, some experts are full-size, but most are replaced with much smaller versions called low-rank adapters—compact approximations of what a full-size expert learned using far fewer parameters. The size of each adapter is set by a value called its rank, and the team found that the best rank depends on what the expert is being asked to do: reasoning-heavy tasks, like working through a multi-step math problem, need higher ranks to preserve performance, while knowledge work, which draws on facts the model has learned, can use lower ones.

Because these two dominant task types – reasoning and knowledge – have different needs, FlexMoRE can shrink the overall model without losing capability. In its best configuration, FlexMoRE outperforms a FlexOlmo-style baseline of full-size experts while using less than one-third the parameters.

"This comes with immense implications for the open model ecosystem, as it facilitates distributed and federated training approaches without data sharing,” Nielsen says. “It's of incredibly high relevance for data owners subject to privacy and governance constraints.”

Modular, in more directions

With FlexOlmo, we showed that language models could be built from independently trained, highly performant components. FlexMoRE extends that idea by making those components smaller so the resulting models can run on more accessible hardware.

“It’s very exciting to see this Danish national project picking up our FlexOlmo architecture and adapting it for their important project,” says Sewon Min, an Assistant Professor in EECS at UC Berkeley, a Research Scientist at Ai2, and a co-author of the FlexOlmo paper. “We’re seeing growing momentum around modular training, both internally and across the broader research community, suggesting that decentralized and distributed training of foundation models is not merely a conceptually elegant idea, but a practical necessity. As frontier models become more and more costly to train and deploy, solutions like this become even more critical to make sure the immense benefits that come from AI systems are not consolidated in just a few hands.”

And we're pushing in this direction too.

Two of our recent projects, EMO and BAR, carry the modular approach into other stages of model development. EMO addresses a limitation FlexOlmo had at pretraining: each expert's specialty had to be defined up front, which meant the modular structure was only as good as the categories drawn in advance. EMO drops that constraint, letting experts develop their own topic specializations as they train. BAR extends modularity past pretraining and into the post-training stages that shape how a model follows instructions, reasons, calls tools, and refuses unsafe requests. In a standard pipeline, those behaviors get tangled together, so changing one tends to break others. BAR gives each new capability its own post-training pipeline, run in isolation on top of a shared base.

Each of these projects targets a different phase of model development, but they share the same underlying premise: that powerful models needn't be centralized or monolithic, and that openness is what makes distributed building work. That's the principle Ai2 has been building toward from the start—and what makes work like FlexMoRE possible in the first place.

"The line of work around FlexOlmo and FlexMoRE has established that a separate-training joint-inference paradigm can be successful, and that it can be efficient," Nielsen says. "We believe that this is the core advantage of the modular architecture of independently trained experts."

FlexMoRE was developed independently by researchers at the University of Southern Denmark and Ordbogen A/S under the Danish Foundation Models project. Ai2 was not involved in the research and has no funding or institutional relationship with the project or its partners.

Join us

At Ai2 we’re building the future of transparent, open-source AI — built in the open to empower scientific progress and fundamental understanding of this world changing technology. We’re not here to make profits, we’re here to make sure benefits of AI are shared widely and for the benefit of humanity. If this appeals to you, please take a look at our open roles.

Open roles

Modular LLMs at scale: how FlexOlmo is helping to pool national expertise without pooling sensitive data

What FlexMoRE changes

Modular, in more directions

Join us

Subscribe to receive monthly updates about the latest Ai2 news.