Skip to main content ->
Ai2

Asta DataVoyager: Data-driven discovery and analysis

October 1, 2025

Ai2


Scientists don't lack for questions they want answered—they lack hours in the day and tools they can trust to get those answers. Experimental logs live in spreadsheets; instrument readings arrive as CSVs; and results tables pile up across projects. Turning those structured files into answers takes time and often requires advanced programming skills to be done efficiently.

To fill the gap, we’re launching DataVoyager in Asta, our ecosystem for scientific research agents. Built to address the challenges scientists face in drilling down into structured datasets, Asta DataVoyager delivers data-driven discovery and analysis capabilities that allow you to make queries about structured files in plain language and get clearly cited, explainable answers with copyable code, clear visuals, and a concise, well-supported summary. 

Our work with CAIA

The Cancer AI Alliance (CAIA) has already prototyped a federated instance of Asta DataVoyager on its platform. CAIA – which unites four leading cancer centers – today announced its federated learning platform that enables AI models to travel to each participating cancer center’s secure, de-identified data to learn from it locally, generating a summary of its learnings without individual clinical data ever leaving institutional firewalls. Ai2’s engagement with CAIA is made possible by generous support from Allen Family Philanthropies.

Read the CAIA announcement here.

CAIA’s custom version of Asta DataVoyager will allow approved scientists and researchers to query these data models in plain language and unlock insights from more diverse AI models trained on each cancer center’s data. CAIA can then run analyses and share de-identified results, unlocking insights without compromising data privacy or security.

As part of the initial project, researchers who are part of CAIA are preparing a lung cancer study using Asta DataVoyager to look at treatments and outcomes federated across the clinical data of multiple institutions, exploring factors like time to surgery with neoadjuvant chemo-immunotherapy, the impact of adding immunotherapy after definitive radiation, and targeted drugs versus standard platinum chemotherapy. 

When complete, this prototype could yield novel, actionable real-world insights to improve patient care. 

“We are excited about the possibility of providing powerful and secure analytics tools to cancer researchers who may not have AI expertise,” says Jeff Leek, PhD, VP and Chief Data Officer at Fred Hutch Cancer Center and Holder of the J. Orin Edson Foundation Endowed Chair. Leek is also the founder and scientific director of CAIA. “When I think about the future of where I want it to go, I think about this tool in the hands of clinicians, helping to answer important questions that will ensure the best possible care for cancer patients.”

CAIA will also compare the analysis code generated by Asta DataVoyager to code produced by expert biostatisticians, and use this to understand the quality of the machine-generated outputs.

Deep analysis for structured data

Asta DataVoyager is a trusted AI collaborator—one that lets researchers make queries about data in natural language and get transparent, reproducible answers they can act on. It was designed from the start to be intuitive for users, regardless of their comfort level working with dataset analysis tooling.

Users upload a dataset in CSV, Excel (.xlsx), JSON (.json/.jsonl), HDF5, TSV, or Parquet format and ask a question (e.g., “Which treatment arm shows the steepest improvement after week 6?”), along with an optional prompt to establish context (e.g., “use these units, measurement cadence, treatment conditions, and outcome variables”) so that Asta DataVoyager makes better initial choices.

Asta DataVoyager then outputs: 

  • A crisp answer to the user’s question, written for scientists
  • Copyable visuals that make the finding understandable at a glance
  • Copyable code that reproduces the analysis
  • A methods section that documents assumptions, detailed reasoning steps, and statistical tests conducted—so users can cite the procedure or adapt it

Importantly, the output is structured and largely consistent across runs. That makes it easier to share with collaborators, copy to a lab notebook, or include in a preprint’s supplementary materials without much hand-reformatting.

Ask follow-ups (“Control for baseline weight,” “Perform non-parametric tests,” “What if we cap outliers at the 99th percentile?”), and Asta DataVoyager adds new cells to the report (equivalent to adding a cell in a Python notebook)—maintaining provenance as the analysis evolves.

Trustworthy tools for research

Asta DataVoyager is a starting point for trustworthy, conversational analysis. It allows teams to stay in full control of their data—they can delete datasets at any time from Asta’s hosted portal or secure on-premises, datacenter, and private cloud deployments.

Data handling is explicit and user-defined. Scientists can install Asta DataVoyager on their own infrastructure or private server and configure the system so that data remains in their purview. That’s critical for teams using sensitive or proprietary data, or operating under clinical or agency constraints.

Asta DataVoyager supercharges your statistical toolkit. It’s about shortening the path from a question to a well-supported answer—and making each step visible so your collaborators, reviewers, and you can understand and reliably trust the result.

Reach out to the Asta team to discuss secure deployments and pilot projects, and sign up for updates below. 

In early pilots and collaborations, Asta DataVoyager is already helping drive science forward. We look forward to hearing what Asta’s new features enable for you—and suggestions to make them even better.

Sign up for updates, or to request early access.