Skip to main content ->
Ai2

Introducing AutoDiscovery: Automated scientific discovery, now in AstaLabs

February 12, 2026

Ai2


Every dataset holds findings no one has looked for yet. The patterns are there – waiting in the rows and columns – but surfacing them requires asking the right questions. In science, knowing which questions to ask is often the hardest part, and most of today's AI tools don't help. They're goal-driven, waiting for you to provide a research question before they get to work.

This is true of everything from traditional statistical workflows to sophisticated multi-agent research systems like Google's AI co-scientist and FutureHouse's platforms. These tools can synthesize literature, design experiments, and validate hypotheses at scale—but they still wait for you to tell them what to investigate.

That's why we built AutoDiscovery (formerly AutoDS), now available in AstaLabs as an experimental feature. Instead of starting with a question, AutoDiscovery starts with your data and asks its own questions—generating hypotheses in natural language, proposing experiment plans, writing Python code to execute them, interpreting statistical results, and using what it learns to generate new hypotheses. Give it a structured dataset and let it explore. Whether you run a quick analysis or hundreds of experiments overnight, you'll receive a complete list of novel possible research directions—each one reproducible and ready to investigate further.

Researchers are already using AutoDiscovery to surface surprising, hidden patterns across disciplines, from uncovering trophic relationships in 20 years of marine ecosystem data to identifying mutual-exclusivity patterns in cancer mutations that could inform treatment decisions. Several of AutoDiscovery’s findings in social science were even published in a peer-reviewed paper last November (after independent verification). 

AutoDiscovery changes the relationship between scientists and their data—transforming datasets from static repositories into interactive artifacts for inquiry that surface impactful new questions.

Read case studies and detailed technical write-ups from our early partners here, and read on for more information about AutoDiscovery and how it works.

"AutoDiscovery is almost like deep research with data, but at the speed of thought." — Sanchaita Hazra, Economist in the College of Social and Behavioral Science at the University of Utah

How AutoDiscovery decides what to pursue

AI systems tasked with open-ended exploration face a classic failure mode: without a principled way to decide which leads are worth pursuing, they either wander randomly or inherit whatever biases are baked into their training data. AutoDiscovery solves this with a key insight: guide the search using Bayesian surprise, namely a measure of how much the system's beliefs change after seeing evidence.

Before running an experiment, AutoDiscovery holds a prior belief about whether a hypothesis is true, represented as a probability distribution (this “belief” comes from the world knowledge in the underlying language model, and is extracted by querying the model multiple times). After seeing the results, it updates to a posterior belief. The surprise is the magnitude of that shift—quantifying the information gain provided by the experimental data or indicating how data drove the system to reconsider its internal estimates.

Importantly, AutoDiscovery tracks not just how large the belief change (surprise) is, but also the direction of that change. A positive shift means the evidence moved the system toward believing the hypothesis is more likely true, while a negative shift means the evidence moved it toward believing the hypothesis is less supported in the dataset. A negative shift can still be highly surprising and sometimes especially valuable, as when evidence contradicts prevailing wisdom.

This design reflects a familiar scientific intuition: results that meaningfully shift our expectations are often more interesting than those that simply confirm what we already assumed. By chasing surprise, AutoDiscovery naturally gravitates toward the unexpected—the results most likely to represent genuine discoveries rather than obvious patterns.

But surprise alone isn't enough. The space of possible scientific questions is effectively infinite, and exploring it requires intelligent search. AutoDiscovery uses Monte Carlo Tree Search (MCTS) to navigate this space efficiently. MCTS balances exploring new hypotheses with prioritizing known leads, allocating computational effort toward the most fruitful branches of inquiry. 

Together, Bayesian surprise and MCTS give AutoDiscovery a principled, scalable way to co-collaborate with human researchers to answer the question: "What should be investigated next?"

"Analyses that would normally require weeks or months of manual exploratory modeling were done in a single day." — Dr. Stephen Salerno, Postdoctoral Researcher in Biostatistics at Fred Hutchinson Cancer Center

Navigating what AutoDiscovery finds

AutoDiscovery started as a research project we published last year with open-source code, but until now there hasn't been an easy, hosted way for scientists to run it. After working with early partners across scientific domains, we're ready to put it in more researchers' hands.

In AstaLabs, AutoDiscovery's progress and findings appear in a table (see left panel above) that populates as experiments complete. Each row represents a hypothesis the system has tested. Watch the Surprisal score to see how new evidence shifts the system's belief from Before to After—quantifying how much each finding challenged or confirmed the system's expectations (see right panel above). Additionally, you can watch the discovery progress as the search tree (the middle panel above) grows by observing the sequence of hypotheses generated. 

Click any row in the table or node in the search tree to inspect the details.

Case study: Discovering mutual exclusivity in cancer mutations

To make this concrete, consider what AutoDiscovery found when exploring a dataset of breast cancer mutations with oncologists from the Paul G. Allen Research Center at Swedish Cancer Institute.

Starting from a broad search over mutation co-occurrence patterns, the system generated and tested a series of increasingly specific hypotheses. One branch of inquiry led to a genomics finding: among patients with PIK3CA mutations, TP53 mutations appear less frequent than expected by chance. This is a potential mutual-exclusivity pattern—a signal that the two mutations may be functionally redundant or that cells carrying both may be non-viable.

AutoDiscovery's prior belief about this hypothesis was neutral. The prior distribution had a mean of 0.50—reflecting uncertainty about whether the pattern would hold (where 0 = believed false, 1 = believed true). After analyzing the data, the posterior represented a sharp shift to a mean of 0.82. This large belief update registered as strong surprisal, which is why the system flagged it and continued exploring related hypotheses.

The collaborating oncologists found the signal striking. AutoDiscovery had surfaced a plausible mutual-exclusivity relationship from a search space far too large to explore by hand—and immediately suggested concrete follow-ups to validate and interpret the finding.

"AutoDiscovery's ability to reveal discoveries that may be hiding in plain sight is especially valuable in cancer research." — Dr. Kelly Paulson, Medical Oncologist and Head of the Center for Immuno-Oncology at the Swedish Cancer Institute

Getting started in AstaLabs

Log into AstaLabs and try the Example Sessions dataset to see the workflow end-to-end before uploading your own data. When you're ready to run your own analysis:

  1. Set up your session. Click + New exploration to open the wizard. Upload your files (CSV, JSON, Parquet, etc.) and describe your context to seed the system's beliefs. If you’re iterating, you can paste learnings from previous runs in the Intent field via Advanced Settings to refine the search. Finally, set the experiment budget to control how many hypotheses to run.
  1. Track findings in real time. Hit Start Run. A live table populates as experiments complete. Watch the Surprisal score to spot the most surprising findings. Feel free to navigate away; your results will be there when the analysis is complete.
  2. Audit the details. Click any row to slide open the Inspector Panel. Here you can verify the work: view the full hypothesis, the statistical analysis, and the actual Python code used to generate the result. It’s a completely transparent artifact you can reproduce and build on.

AutoDiscovery runs are compute-intensive, typically running for several hours, so for early access, we're covering the cost via a one-time credit grant. You'll automatically receive 1,000 Hypothesis Credits (1 hypothesis = 1 credit). Credits are available through Feb. 28, 2026.

How to spend your credits wisely? Think of your first run as a test drive. We suggest starting with a small batch (<10 hypotheses) just to see what's possible. Once you're familiar with the output, you can confidently scale up to 50–100 hypotheses per session for deep analysis on larger datasets. (Note: Runs are capped at 500 hypotheses.)

You'll be prompted to confirm uploaded data isn't confidential. Source datasets are automatically deleted 7 days after analysis completes. AutoDiscovery only retains the outputs – hypotheses, plans, code, and results – you need to reproduce and extend your findings.

Try AutoDiscovery in AstaLabs today—you may be surprised by what it discovers! Need support or more information? Please reach out.

Sign up for Asta Preview to gain early access to features like AutoDiscovery. Learn more here.

Subscribe to receive monthly updates about the latest Ai2 news.