Skip to main content ->
Ai2

Studying how models learn

"Olmo ... [makes] the science of language modeling possible." — Minjoon Seo, Associate Professor at KAIST AI

As language models increasingly become sources of knowledge for people, taking on the roles previously played by traditional search engines and the open web, it's becoming critical that we understand how they decide which pieces of information they treat as facts, and which they discard.

We often talk about what an AI model “knows,” but we rarely get to watch it learn. Fortunately, Olmo is helping to pull back the curtain on how models acquire new knowledge.

In a recent study by Hoyeon Chang, Jinho Park, Seonghyeon Ye, Sohee Yang, Youngkyung Seo, Du-Seong Chang, and Minjoon Seo, the team showed how a model picks up specific facts – and later forgets them – during training.

The researchers used Olmo’s public checkpoints, then restarted training with a twist: they slipped in carefully designed bits of new information and watched how the model internalized them over time. Because Olmo comes with not only checkpoints but also an open training dataset (Dolma), the team could observe learning step by step in a controlled way.

“We used most of the Olmo checkpoints, especially the early ones (to study Olmo’s dynamics in the early pre-training stage) and downloaded Dolma, too, in order to augment it with our data,” Seo, a co-author of the study and an Associate Professor at KAIST AI, says.

The researchers tested Olmo models at early, mid, and late stages of pre-training and, every ~100 training steps, injected their new info while keeping the rest of the batch the same so changes could be traced to the injection.

So what did the learning look like? Right after the model saw a new fact, its internal confidence jumped, then slowly faded as training continued on other text—only to jump again when the fact appeared later.

Two takeaways stood out. First, how information shows up in training matters: exact repeats make the model memorize quickly but also forget faster; showing paraphrases slows the forgetting and helps preserve generalized knowledge. Second, larger and more diverse datasets can help because popular facts are encountered more often, giving the model enough repeated exposures to cross a “learnability threshold.”

All of this depends on openness. The only way to run a controlled learning experiment is to know exactly what the model is seeing and when, and Olmo makes this possible. If we want models that remember the right things, we need open setups like Olmo that let us watch learning happen—not just query a black box.

“All other sufficiently good LLMs, even if they are open-weight, do not release anything about what data they were trained on or intermediate checkpoints,” Seo says. “Olmo has been the only model that does this—it’s making the science of language modeling possible, enabling the community to truly understand how language models learn.”