Ai2 Newsletter
September 2025
Top story - Accelerating trustworthy agents for science
Scientific research grows more logistically complex by the day. Scientists have to juggle navigating vast bodies of literature, running experiments, analyzing data, and iterating—quickly. Making matters worse, fragmented tools and unintuitive workflows can easily lead to missed relevant findings, hamstringing the discovery process.
That’s why we developed Asta, an ecosystem of AI tools for science, including a capable agentic research assistant. We believe AI has the potential to support and accelerate scientific discovery, and that Asta – the culmination of years of research – is a significant step in that direction.
What’s unique about Asta is that it’s designed to work the way scientists think, helping frame research questions and tracing ideas back to evidence. Importantly, Asta features benchmark evaluations to ensure its components remain robust and highly transparent—and grounded in science itself.
Jonathan Whitaker, a data scientist at the AI R&D lab Answer.AI, says that Asta’s paper-finding tools surface research artifacts that other agentic assistants miss.
“We're at ~2,000 papers in our shared library, so we'll often hit questions like, ‘Didn't they do this before in that paper?' and then we'll have a minute or two of nobody remembering the name to go check,” he says. “Having good paper search for something specific like this is so nice, and Asta has nailed it the times I've needed it.”
Asta can also facilitate literature reviews—comprehensive syntheses of scholarly sources on particular subjects. It turns research questions into structured summaries, drawing on a library of millions of papers.
“[When] some new thing comes out or we're looking at something specific, it's nice to get basically a quick lit review of what else is related [or] narrow in on some specific thread of a work,” adds Whitaker.
David Hendrickson, CEO and Principal Consultant at GenerAIte Solutions, uses Asta to explain new technologies—and even debunk marketing claims around products.
“From my perspective, this is the best tool on the market for academic-based research,” he says. “Nothing else comes close.”
There’s even more to Asta than powerful agents for literature search. AstaBench is an environment and toolset to test the scientific reasoning capabilities of agents, while Asta resources is a collection of baseline models and methods to build, test, and refine agents for scientific work.
We plan to expand Asta soon, first with a data analysis capability that can generate hypotheses, run statistical tests, and explain the results succinctly. Down the road, the aim will be to introduce features that support replicating computational experiments and refining research questions, as well as writing code for scientific simulations, simple machine learning, and more.
Our goal isn’t to replace scientists or automate science end-to-end. Rather, we see Asta as an AI collaborator—one that’s empirically validated and trustworthy. Importantly, Asta will always be largely open, allowing researchers to use it how they want and adapt it to their needs.
Learn more about Asta in our blog, and get started here.
Ai2 receives combined $152M from NSF, NVIDIA
Ai2 has been awarded $75 million from the U.S. National Science Foundation (NSF) and $77 million from NVIDIA as part of a jointly funded project to create a national-level AI infrastructure and accelerate our research around fully open AI models for science.
Reasoning in 3D space
Our new MolmoAct model is the first Action Reasoning Model, capable of reasoning in 3D space. Built on Molmo, our fully open multimodal model, MolmoAct bridges the gap between language and action, bringing the intelligence of state-of-the-art AI models into the physical world.
Signal and Noise
Our new research shows that AI benchmarks with higher signal-to-noise ratios are more reliable for making decisions about model training. To support further research on this topic, we've released a dataset of 900K evaluations across 465 open-weight models, including Olmo, DataDecide, and ladder scaling law models. We plan to continue using these metrics to bolster our evaluation infrastructure—and we believe a focus on the "evaluation of evaluation" can improve the way we build models moving forward.
Paper Finder open-sourced
We open‑sourced Paper Finder, our LLM‑powered literature‑search agent that surfaces scientific papers other tools miss. Paper Finder mirrors how you hunt for related work: it breaks down your natural‑language query, follows citations, reranks results, and even explains why each paper matters.
Olmo-2-1B checkpoints
We released early pre-training checkpoints for OLMo-2-1B to help study how LLM capabilities emerge. They’re fine-grained snapshots intended for analysis, reproduction, and comparison.
olmOCR updated
olmOCR v0.2.1 has arrived with new models. Our open‑source OCR engine now reads tougher docs with greater precision, and it’s still 100% open. Plus, a brand‑new trainer code lets you recreate our checkpoints or fine‑tune your own models with just a few commands.
New eval for question-answering
MoNaCo, our new eval for question-answering cross‑source reasoning, tests how well models stitch together evidence across dozens (or hundreds) of sources. When we evaluated GPT-5, Claude Opus 4, Gemini 2.5 Pro, and DeepSeek R1, even the strongest models struggled.
Olmo Discord bot
You can now chat with Olmo 2 32B Instruct, our most capable language model, directly in our Discord. Type "@AskOlmo" (without quotes) to ask about research, code, or curiosities—responses come in real time.