Asta: Accelerating science through trustworthy agentic AI
August 26, 2025
Ai2
Announcing Asta, an integrated ecosystem with an agentic research assistant, a benchmarking framework for AI agents, and developer resources
Today we announce Asta, our bold initiative to accelerate science by building trustworthy and capable agentic assistants for researchers, alongside the first comprehensive benchmarking system to bring clarity to the landscape of scientific AI. As AI use expands across the sciences, researchers in every field need helpful systems they can understand, verify, and trust. Asta is designed to fill this need.
The Asta ecosystem brings together three essential components to advance scientific AI. At its center are the Asta agents, tools to assist (not replace) human researchers performing complex, real-world scientific tasks. To promote transparency and raise the bar across the broader landscape of scientific AI, AstaBench provides a rigorous, domain-relevant benchmarking framework for evaluating and comparing any agent—not just Asta. And for AI developers, Asta resources offers a set of software components and standards to help build, test, and refine scientific AI agents.
Asta is an evolving initiative with many milestones ahead. By releasing each component along the way, we will continue to improve scientific AI agents through real-world usage and feedback from the research and AI development communities. Realizing the potential of AI for science is a shared effort, and we invite you to learn more about what we're releasing, where we're headed, and how we can move forward together.
Why do we need Asta?
Scientific research is only getting more complex. Scientists must navigate an ever-expanding body of literature, run experiments, analyze data, and iterate quickly—often while managing fragmented tools and cognitively demanding workflows. It’s easy to miss relevant findings, duplicate work, or overlook connections across disciplines. We believe that AI has the potential to help, but so far, it hasn’t delivered. Despite early signs of promise, many systems fall short: they hallucinate, lack transparency, and struggle with reproducibility. In a field where rigor is non-negotiable, that’s a problem.
Scientists remain skeptical—and AI researchers face a different challenge: without trusted standards, it’s hard to evaluate whether models are truly capable of the deep reasoning science demands.
For researchers, Asta offers agents designed to work the way scientists think—helping frame research questions, trace ideas to evidence, and clarify what’s established or still unresolved in a field. AstaBench ensures these agents are evaluated against real-world tasks, building trust and scientific rigor.
For those building scientific AI agents who need a reliable way to evaluate and improve their work, AstaBench and Asta resources provide a complete agent evaluation environment and tools to facilitate the creation of production-ready, state-of-the-art agents. AstaBench offers leaderboards and real-world benchmarks that make it easy to test, compare, and iterate on AI agents.
Asta resources includes our own open-source and baseline agents ready to fork and compare against, open language models post-trained on science, and modular tools enabling agents to perform scientific research tasks. These tools are easy to use via the Model Context Protocol (MCP) and support both reproducible, time-fixed evaluation and up-to-date real-world usage.
"The way research is being done, and the way code and papers are being written, is changing quite drastically, right in front of our eyes. Good literature reviews should play the role of reducing tunnel vision in the scientific community, and Asta does that very well.” — Subbarao Kambhampati, Professor of Computer Science at Arizona State University and former President of the Association for the Advancement of Artificial Intelligence (AAAI).
How is Asta different?
Unlike general-purpose AI systems, Asta is specifically built for science. We designed it for full transparency—every output is cited and traceable to its sources, supporting reproducibility, peer review, and methodological rigor. To meet the demands of complex research, Asta doesn’t just respond—it plans, executes, and iterates, autonomously carrying out critical steps. And because openness is foundational to our mission, the Asta agents and evaluation framework are open-source.
Our goal is to augment – never replace – scientists across diverse fields with AI collaborators they can trust. Asta is also designed to advance the science of AI. While many AI agents perform well in constrained settings – where goals are predefined, options are limited, and outcomes are easy to verify, such as in customer support – scientific research demands a different kind of intelligence, one capable of building on prior knowledge while introducing ideas that are novel, verifiable, and impactful.
Most current benchmarks test agentic AI and isolated aspects of scientific reasoning, but rarely evaluate agentic behavior rigorously or capture the full skill set required for scientific research. Simple strategies, like repeating a task many times and voting on the answer, may yield high accuracy but only by consuming exorbitant resources, so it is important to measure both accuracy and cost. Scientific AI needs evaluations that reflect the real complexity of research.
AstaBench fills that gap as an open source agent evaluation framework and suite of benchmarks for assessing AI assistants on core scientific tasks requiring reasoning. Beyond validating Asta, AstaBench advances the field by setting a rigorous standard for evaluating any agentic AI in real-world, high-stakes domains, ensuring these systems serve as valuable research partners.
We invite the broader community to build on Asta’s components and help push the boundaries of scientific AI.
"More than ever before, researchers struggle with literature search and synthesis. Ai2's Asta ecosystem of AI agents, benchmarks, and tools helps to break these barriers. Its system is poised to accelerate the path from hunch to insight, transforming how we navigate the vast landscape of scientific understanding." — James Evans, Professor and Director of the Knowledge Lab at the University of Chicago
Asta, now and the future
Asta, an agentic tool for scientists
Asta is designed to support researchers who need to navigate large volumes of information, analyze data, and generate insights efficiently—without compromising scientific rigor. It’s specially built for people who need to regularly review literature, form hypotheses, run analyses, and communicate findings, especially those collaborating across disciplines or exploring emerging topics where connecting ideas is key. Core discovery and synthesis features from Asta, such as Ai2 Paper Finder and Ai2 Scholar QA, were first released as standalone tools. Now, informed by real-world use, they’re being refined and folded into Asta as part of a broader push towards a unified user-centric experience, with more tools on the way.
Our initial release of Asta includes three core functions:
- Find papers helps you discover relevant research using an LLM-powered search experience, like Google Scholar on steroids, that mirrors the multi-step reasoning process of expert researchers. It reformulates queries, follows citations, and explains why each paper is relevant, making it easier to find exactly what you're looking for—even when keywords fail.
- Summarize literature turns complex research questions into structured, comprehensive summaries—every claim backed by a clickable citation and often an inline excerpt. It scans millions of abstracts and full-text papers, clusters evidence, and distills findings into clear sections that highlight key results, disagreements, and open questions.
- Analyze data (available in beta for select partners) turns natural language questions into structured, reproducible analyses. It explores your dataset, generates hypotheses, runs statistical tests, and explains the results—making data-driven discovery faster, clearer, and accessible across disciplines.
In the long run, Asta aims to bring advanced AI capabilities into one intuitive interface, helping researchers work faster, think deeper, and stay focused on the science. We are training AI systems to perform all of the skills of a developing research assistant. Once their performance and explanatory abilities meet the bar for real scientific value, they’ll be incorporated into Asta.
Here are some of the skills that we plan to release in the future:
- Experiment replication: Quickly duplicate computational experiments described in research papers, finding online repos and data, and loading appropriate packages.
- Hypothesis generation: Explore and refine testable research questions grounded in existing evidence. Automatically identify discoveries that hold promise, using AI as a research assistant.
- Scientific programming: Write code to perform computational experiments, including data cleaning, simple machine learning, and simulations.
“I use Asta daily to weekly—I work with it to help inform our therapeutic discovery programs at NewLimit. Science is mostly a game of choosing which experiment to run, and exactly how to run it. Improving our performance on those selection and design tasks means we're more likely to make discoveries for each unit of time and capital invested.” — Jacob Kimmel, Co-Founder and President of NewLimit
AstaBench, a benchmarking framework to evaluate AI agents on scientific tasks
Today we’re releasing the first version of our evaluation framework and benchmarking suite, AstaBench, which contains over 2,400 problems across 11 benchmarks in four core categories: literature understanding, code and execution, data analysis, and end-to-end discovery. AstaBench helps scientists identify which agents best support their needs through task-relevant leaderboards, while giving AI developers a standard execution environment and standard tools to test the scientific reasoning capabilities of their agents compared to a large integrated collection of well-known baselines from the literature, including both open and closed LLM foundation models and agents.
A key feature is AstaBench’s reporting of the Pareto frontier across reasoning accuracy and computational cost, making tradeoffs transparent—similar to approaches like ARC-AGI. It also includes agent tools like date-restricted retrieval, which limits agents to open-access scientific papers published before a task’s “research date.” This ensures tests remain reproducible even as science advances and enables apples-to-apples evaluation of agent reasoning, akin to giving every student the same calculator and open textbook.
Learn more in the AstaBench technical blog post.
Testing 57 AI agents across 22 different architectures shows progress but clear limitations—only 18 handled all benchmarks, with modest scores overall. Our domain-specific Asta v0 (which uses a mixture of LLMs) led at 53.0%, ~10 points above ReAct-gpt-5, though at higher engineering and runtime costs. Cheaper setups like ReAct-claude-3-5-haiku (20%) and ReAct-gpt-5-mini (31%) offered strong cost-performance tradeoffs. Data analysis remains the hardest domain, while literature review tasks are most mature, where our Asta Paper Finder and Asta Scholar QA agents matched or outperformed rivals.
Note: Asta v0 is an experimental agent not used in the production version of Asta today. The open source versions of Paper Finder and Scholar QA also may differ from the production versions; for example, while Paper Finder production is based on the open source library, there are some proprietary artifacts, including artifacts tied to our UI infrastructure, that we aren’t able to open-source.
Asta resources, a set of tools and standards for AI developers
Asta resources is a set of first-party Asta and baseline agents, methods for searching and navigating the scientific literature, and agent tools that are fully integrated with AstaBench to provide a complete environment for AI developers to build, test, and refine trustworthy scientific AI agents.
It includes the Scientific Corpus Tool, an MCP extension of our Semantic Scholar API that provides agents free access to a 200M+ normalized index of papers, already serving over 1.5 billion queries per year. This new tool, designed for seamless use with agents via MCP, enables both sparse and dense full-text semantic search across open-access papers. It also offers additional functions that make it easy to implement agents that follow common graph-based discovery strategies, such as starting from one paper to find more recent work in the area that cites it, or identifying other papers written by the same authors.
Perhaps even more impactful are Asta resources' comprehensive library of open-source agents and open language models that have been post-trained for science.
We believe these developer tools will democratize the field, dramatically lowering the time and cost required to develop truly intelligent agents that can reason deeply about science—and indeed about other aspects of the world as well.
Learn more in the Asta resources details page.
“Compared to other AI tools I often use, Asta goes deeper into the possibilities of the data. It doesn’t just point to the obvious patterns in the data; it gets into some of the more nuanced possibilities of the problem. In my research, not only has Asta helped me do much more analysis with complex plots and modeling, but it has also enabled me to view its internal narrative, making sure the analyses are robust and communication-worthy.” — Sanchaita Hazra, Principal Statistician at DeepFlux
Summary
Asta is a bold initiative that will continue to grow and evolve. While the agentic AI field is moving rapidly, we will stay true to the long-term vision of accelerating science and the values of being transparent and scientifically grounded.
Crucially, true to Ai2’s mission and the Asta project’s aim to advance the capabilities of agents, AstaBench and all the agents in the baseline set – which include versions of the production Asta agents – are fully open-source. Developers and researchers have the ability to inspect, fine-tune, and adapt them for their own purposes, as well as deploy those agents privately.
Whether you’re a researcher needing a trustworthy AI research assistant, an AI developer looking for accessible tools and benchmarks, or anyone interested in scientific AI agents, we invite you to join us on the journey.