Skip to main content ->
Ai2

Machine unlearning with Olmo

“Studies like ours are either infeasible or irrelevant unless we have access to models like Olmo.” — Aravind Krishnan, AI Researcher at Universität des Saarlandes

As AI systems move into products that touch health, education, finance, and public services, society needs a way to correct them after the fact—whether to honor a “right to be forgotten,” remove sensitive or erroneous material, or disable unsafe behaviors uncovered in the wild.

That’s where unlearning comes in: methods that surgically remove targeted knowledge while preserving everything else a model does well.

Scientists Aravind Krishnan (Saarland University), Siva Reddy (Mila/McGill; Canada CIFAR AI Chair), and Marius Mosbach (Mila/McGill) used our fully open Olmo-7B model and its open pre-training corpus, Dolma, to show why unlearning can be difficult in practice, focusing on two key points:

  1. How often a fact appears in a model’s pre-training data strongly affects whether a model can later “forget” it; more frequent knowledge is harder to erase.
  2. Depending on how you measure unlearning, the same model can look successfully scrubbed—or not—with those disagreements getting worse as models scale.

“We needed a model family that satisfied two criteria: non-trivial performance on downstream tasks to measure utility and access to the training data such that we could estimate the frequency of the data points we wanted to unlearn,” Mosbach says. “Olmo satisfied both of these criteria.”

The team first downloaded Olmo models and used them with their own datasets. The models were easy to set up—Mosbach recalls them being “plug and play.”

The researchers then bucketed question–answer pairs (e.g., country→capital, book→author) by how often the entities co-occur in Dolma, and ran standard unlearning methods on Olmo-7B. Their experiments show a clear trend: more frequent knowledge is harder to erase, and evaluation methods can disagree about how much collateral damage unlearning causes as models scale.

Crucially, this kind of frequency-aware analysis is effectively impossible when the pre-training data are unknown, as they are with most LLMs.

“When we – and researchers in general – run experiments on a new model, we take two things into account: Is the model transparent and does it perform close enough to the state of the art?” Krishnan says. “This helps make our results transferable and generalizable and makes our findings relevant. Studies like ours are either infeasible or irrelevant unless we have access to models like Olmo.”

The paper, “Not All Data Are Unlearned Equally,” was accepted to the COLM 2025 conference, and the team has published their Olmo-based evaluation datasets so others can dig in.

The broader lesson is this: When you need to explain or audit model behavior, start with inspectable ingredients—open weights, pre-training code, and data. That’s exactly the role Olmo was built to play.