AutoDS: A prototype engine for autonomous, open-ended scientific discovery

July 17, 2025

Bodhisattwa Majumder, Dhruv Agarwal - Ai2

AI-powered science continues to advance in exciting ways. Today, we’re launching AutoDS—a prototype open-source engine for autonomous, open-ended scientific discovery using large language models (LLMs). AutoDS – short for Autonomous Discovery via Surprisal – goes beyond standard data crunching by building upon its own findings and uncovering insights that may not be immediately apparent even to experienced researchers.

Our experiments show that AutoDS can spot unexpected findings, opening the door to a new era of AI-assisted breakthroughs. While it isn’t always speedy, AutoDS can handle what’s normally an arduous task without oversight or intervention, allowing researchers to devote their attention to more meaningful work.

In the scientific discovery process, not only is it important to find answers to pressing research questions, but it is critical to figure out which questions to ask in the first place. Scientists generate and reformulate hypotheses that are most likely to lead to novel, impactful scientific discoveries. It’s ultimately how progress gets made.

Of the AI systems focused on automating aspects of scientific discovery, few address this key question-asking step. Most operate within a “goal-driven” setting: given some data, the user is required to provide a research question, which prompts a model to generate a hypothesis, propose an experiment, perform the experiment, and analyze the results to derive a conclusion.

AutoDS is different, not requiring a hypothesis from an expert. Akin to the workflow of a human scientist, AutoDS explores more broadly by generating hypotheses according to its own measures of research promise. The system can use the results of (statistical) experiments that it generates and conducts to propose new hypotheses in a never-ending process.

There have been previous attempts to build a system like AutoDS—a system that can autonomously iterate and refine hypotheses through experimentation. However, these attempts have largely relied on methods prone to bias and unreliability. It can be incredibly challenging for humans – let alone AI systems – to identify lines of inquiry that lead to novel discoveries, and to avoid subjectivity influencing which hypotheses are pursued in the end.

AutoDS addresses this with a new approach built upon the concept of Bayesian surprise, which quantifies how “surprising” a finding is to an observer – the LLM underpinning AutoDS in this case – and directs search to further pursue those surprising leads.

Here, surprise is computed by measuring the change in an observer’s belief about a hypothesis before and after seeing experimental evidence – i.e., their prior and posterior beliefs. This design decision was motivated by findings showing that the improbability, or surprisal, of a hypothesis is often a strong predictor of scientific impact. LLM surprisal is then used as the reward signal to guide AutoDS' hypothesis generation via Monte Carlo Tree Search (or MCTS, of AlphaGo fame), along with a technique called progressive widening to account for the unbounded search space of scientific discovery. Our findings reveal that MCTS is crucial to the efficacy of AutoDS, outperforming alternative mechanisms by up to 29% in terms of the number of surprising hypotheses discovered.

Bayesian surprise gives AutoDS a way to understand how surprising or unexpected new information is, given what it already “knows,” while MCTS with progressive widening helps generate new hypotheses from these surprising findings.

Promising Early Results

We conducted a series of experiments to evaluate whether AutoDS could consistently generate surprising hypotheses in subjects including economics, biology, and finance. In the first set of tests, we had a large language model evaluate AutoDS' hypotheses. In the second set, human annotators with STEM MS and PhD degrees served as the evaluators.

Evaluated across 21 real-world datasets, AutoDS outperformed competitors by 5-29% at finding discoveries that are surprising to the LLM. In the human study (which involved more than 500 hypotheses), 67% of the discoveries made by AutoDS were also surprising to the human evaluators.

Of course, open-ended AI systems like AutoDS must be met with careful academic scrutiny and rigorous peer review. Still, the promise shown by AutoDS, and our other recent releases like CodeScientist and the Genesys prototype, hints at what could be possible: a future where AI accelerates discovery and helps uncover the unexpected.

We encourage you to try AutoDS for yourself here. Read the paper here.

AutoDS: A prototype engine for autonomous, open-ended scientific discovery

Promising Early Results

Subscribe to receive monthly updates about the latest Ai2 news.