Skip to main content ->
Ai2

How Domyn and AISquared built on Ai2's open releases

June 18, 2026

Ai2


Companies in regulated industries like financial services, healthcare, academia, and the public sector often face a procurement problem with AI: many models ship without the data provenance, risk documentation, or input data protections that compliance, legal, and security teams need to approve high-impact deployments.

Two AI labs building for regulated industries, Domyn and AISquared (both of which are unaffiliated with Ai2), have developed models that draw directly on Ai2's open releases. Based in Milan, Domyn focuses on AI sovereignty – full ownership and control of the models and data they deploy – for financial services, advanced manufacturing, and other regulated sectors. Headquartered in Washington, D.C., AISquared serves the federal government and U.S. enterprise customers in those same industries.

Both chose Olmo because it’s available with the full model flow, including the training data, code, and architectural blueprints—delivering the high level of transparency and customization needed for Domyn and AISquared’s client bases.

Why AISquared chose Olmo

Earlier this year, AISquared released Bolt, a family of open-weight small language models designed for enterprise workflows like retrieval-augmented generation (RAG), document processing, and model routing. Bolt Instruct, the family's instruction-following sub-family, is fine-tuned from Olmo 2, Olmo 3, and Olmo 3.1 across three sizes—1B, 7B, and 32B.

For Jacob Renn, AISquared's Co-Founder and Chief Data Scientist, choosing Olmo as the base for Bolt came down to Ai2’s philosophy of openness.

"Because Olmo is fully open, we had complete visibility into its architecture and training data, allowing us a higher level of trust compared to less transparent open-weight models,” says Renn. Other foundation models AISquared tested relied on less-supported architectures or arcane methods, and the resulting fine-tunes "were less efficient, more difficult to deploy and work with, or required much more complex and costly training schemas which still resulted in worse performance,” says Renn.

“Olmo's transparency and permissive licensing made it an easy choice among the set of U.S.-originated models," adds Renn. “Furthermore, its license ensured that we could adapt Olmo as needed and license it to our customers.”

On top of Olmo, the AISquared team customized Bolt Instruct to produce machine-readable structured outputs, reduce hallucination rates in RAG, detect personally identifiable information (PII) and jailbreak attempts, and route requests across other models. Inside UNIFI, AISquared's enterprise platform, Bolt Instruct now plays two roles: a guardrails layer that blocks disallowed content before it reaches downstream systems and a router that directs each request to the model best suited to handle it. 

According to Renn, migrating to Bolt cut AISquared's own infrastructure hosting costs by roughly 50%, with customers seeing similar reductions in costs.

How Domyn used Dolma and Dolci

In May, Domyn released Domyn Small, a 10B-parameter open-weight reasoning model built in part on Ai2's open Dolma and Dolci datasets. Because Dolma and Dolci ship with documented sources and permissive licenses, Domyn could publish Domyn Small's recipe in a form regulated organizations could trace from end to end.

"The auditability claim is only defensible if we can document what went into a model’s training data, not just what came out of training," says Martin Cimmino, AI Engineering Manager at Domyn. "Any person can go look at exactly what the model saw.”

To develop Domyn Small, Domyn started from Italia 10B – a foundation model it trained from scratch – and layered a multi-stage post-training pipeline on top. Italia 10B gave Domyn a strong initial foundation, but the model had been trained for general use rather than reasoning, and its context window was too short for the long documents Domyn's customers typically work with. Extending it called for another round of training on high-quality, longer-form data.

Dolma fit the bill. The source of its data – and how it was cleaned and filtered – is public, so Domyn could calibrate it against the rest of the company’s internal data mix "rather than flying blind on opaque web crawls," says Cimmino. In addition, Dolma’s open license and clear provenance helped "clear the procurement-side review we have to pass for downstream commercial deployment,” adds Cimmino.

After Dolma, the next step was teaching Domyn Small to give clear, accurate responses rather than vague or obviously wrong ones. To do this, Domyn sourced Dolci, Ai2’s dataset containing around 260K response pairs built for exactly this kind of tuning. We released Dolci last year alongside Olmo 3.

On GPQA-Diamond, a graduate-level science reasoning benchmark, Dolci helped Domyn Small gain 10.1 points—the biggest single jump in the model's post-training pipeline.

"The empirical payoff was real," says Cimmino.

What Ai2's openness makes possible

For AI labs serving regulated customers, the bar isn't just high capability—it's auditability and control. The EU AI Act raises that bar higher, requiring providers of general-purpose AI models to publish detailed summaries of their training data. In the U.S., federal customers carry their own constraints around provenance and licensing.

What changes the picture is the kind of upstream openness Ai2 builds into its datasets and other research artifacts.

"Ai2's published documentation feeds straight into our traceability and AI Act compliance artifacts," says Cimmino. "The commitment to releasing the full stack is genuinely unusual at the scale Ai2 operates. Ai2's work anchors a credible alternative to closed proprietary pipelines for labs like ours that are building under sovereignty and public-interest constraints."

Subscribe to receive monthly updates about the latest Ai2 news.