Going beyond open data – increasing transparency and trust in language models with OLMoTrace
Jiacheng Liu et al. / April 9, 2025
Today we introduce OLMoTrace, a one-of-a-kind feature in the Ai2 Playground that lets you trace the outputs of language models back to their full, multi-trillion-token training data in real time. OLMoTrace is a manifestation of Ai2’s commitment to an open ecosystem – open models, open data, and beyond. OLMoTrace is available today with our flagship models, including OLMo 2 32B Instruct.
We developed OLMoTrace to enable researchers, developers, and the general public to inspect where and how language models may have learned to generate certain word sequences. OLMoTrace is a one-of-a-kind feature and is made possible by Ai2’s commitment to making large pretraining and post-training datasets open in the interest of advancing scientific research in AI and public understanding of AI systems. OLMoTrace is available today with our three flagship models: OLMo 2 32B Instruct, OLMo 2 13B Instruct, and OLMoE 1B 7B Instruct, and it can be applied to any language model for which you have access to the training data.
In this blog post, we will cover:
- How to use and interact with OLMoTrace
- How to interpret the results, to help with fact checking, and tracing “creative” expressions and math capability
- The technical innovations behind OLMoTrace
Interact with OLMoTrace
To activate OLMoTrace for a model response, click on the “Show OLMoTrace” button. After a few seconds, several spans of the model response will be highlighted, and a document panel will appear on the right. The highlights indicate long and unique spans that appear verbatim at least once in the training data of this model. We pick “long and unique” spans so that they are likely interesting enough to warrant further inspection.
In the side panel is a collection of documents from the training data where the highlighted spans appear. Sometimes, a highlighted span may not be present contiguously in any single document, but different parts of that span are present (possibly in different documents) and together they cover the entire span.
If you click on a highlight in the model response, the side panel will show only documents containing the selected span. Similarly, if you click “Locate span” on a document in the side panel, the span highlights will narrow down to those that appear in the selected document. Clicking on the highlight or the document again cancels the selection.
Interpreting results
The interpretation of text matches with training data is quite nuanced. For some highlighted spans in the model response, we can use their corresponding documents to fact check claims made in the spans. The first span in the Celine Dion thread above was one example. As another example:
OLMoTrace shows that 10 documents in the model’s training data contain the same sentence as in the highlighted span. If you personally verify that these documents come from trusted sources, you can gain confidence that this piece of information generated by the model is factually correct.
Meanwhile, some highlighted spans are more generic and not specific to the topic, and the corresponding documents are less relevant to the topic as well. We use a less saturated color to highlight spans with less relevant documents, and they are labeled in the document list as “low relevance.”
In creative writing tasks there is less of a need for fact checking, but you can still trace certain LM-generated expressions back to the training documents.
You can also trace a model’s capability to evaluate arithmetic expressions and solve math problems. Here we show a basic example. When asked about this combinatorics problem (from AIME 2024), the model was able to calculate the value of a binomial number. How did it gain such capability? OLMoTrace reveals that this entire expression appears multiple times in the training data, so the model could have easily learned it through memorization.
Through OLMoTrace, you can gain insights into why the model generates certain sequences of words. In the following example, our 13B model claims that its knowledge cutoff date is August 2023. We know this isn’t true, since its pre-training data has a cutoff of December 2022. Without OLMoTrace, we would have no way to tell why the model returned this false information, but now we can see that the model may have learned to say August 2023 from the post-training examples listed on the right. This finding led us to remove contents containing knowledge cutoff dates from the 32B model’s post-training data.
The use cases we shared above are only part of what can be learned from OLMoTrace. We are still in an early stage in exploring the full potential of this tool, and we are opening it up so that together as a community, we can demystify how training data affects language models.
OLMoTrace behind the scenes
At a high level, OLMoTrace scans the model output text and looks for relatively long and unique spans that have appeared verbatim in the model’s training data. Efficiently identifying these spans is an extremely challenging problem, since our language models are trained on many trillions of tokens. Below we share an overview of the OLMoTrace inference pipeline and our innovative solution to this problem.
First, OLMoTrace tokenizes the model output, and finds all spans of the token ID list that meet the following criteria:
- Existence: The span appears verbatim at least once in the training data;
- Self-contained: The span does not contain period token (.) or newline token (\n) unless it appears at the end of the span; and the span does not begin or end with incomplete words;
- Maximality: The span is not a subspan of another span that meets the above two criteria.
This is the most compute-heavy step in OLMoTrace. To speed up computation, we index the training data with infini-gram and develop a novel parallel algorithm, reducing the time complexity of span finding from the naive O(L^2 * N) to O(L * log N) (L = length of model output, N = size of training data).
After finding all such spans, OLMoTrace ranks them by a “span unigram probability” metric in ascending order and keeps the K spans with the smallest unigram probability, where K = 0.05 * the number of tokens in the model output. A lower span unigram probability usually means the span is relatively long and contains unique tokens rather than common expressions. These spans are highlighted in the model output.
For each kept span, OLMoTrace retrieves up to 10 document snippets enclosing this span from the training data, which are displayed in the side panel. If there are more than 10 documents for a span, a random set of 10 is sampled. If two or more spans happen to be contained in the same document, then we only show this document once (and display the context around each span).
Overlapping spans are merged into a single highlight. The spans we get from the above process may have partial overlap with each other, and when this happens, we merge their highlights to keep the screen decluttered. You will be able to identify such cases by selecting the span and inspecting its documents.
To prioritize showing the most relevant documents, in the document panel we rank all documents by a BM25 score in descending order. The per-document BM25 score is computed by treating the collection of retrieved documents as a “corpus”, and the concatenation of user prompt and LM response as the “query”. We find that this BM25 score conveys some notion of “relevance”, and thus we surface documents with higher scores on top.
The training data of OLMo 2 32B Instruct consists of olmo-mix-1124 (pretraining), dolmino-mix-1124 (mid-training), tulu-3-sft-olmo-2-mixture-0225 (supervised finetuning), olmo-2-0325-32b-preference-mix (preference learning), and RLVR-GSM-MATH-IF-Mixed-Constraints (RL with verifiable rewards). In total, these five datasets contain about 3.2 billion documents and 4.6 trillion tokens, and OLMoTrace matches against all of them. The training data of the other two supported OLMo models, OLMo 2 13B Instruct and OLMoE 1B 7B Instruct, are similar in size; they share the same pretraining and mid-training data as the 32B model while having slightly different post-training data.
To learn more about OLMoTrace, please read our technical paper. We also release our source code.