Update 12/8: DataVoyager is now available in Asta
Over the past several months, we've been piloting Asta DataVoyager with researchers, students, and early-access users across a range of disciplines. Today, we're making it generally available so more people can use it as a transparent AI partner for data-driven discovery.
Over 70 universities, institutes, and companies have already been using DataVoyager to pull insights from their data. Marine ecologist Fabio Favoretto and collaborators at Scripps Institution of Oceanography used DataVoyager to explore more than twenty years of reef data from the Gulf of California. Their goal wasn't just to reproduce known analyses, but to uncover new insights they could easily share with colleagues.
Favoretto described DataVoyager as "a fantastic companion," highlighting how it can surface unexpected relationships in complex ecological data where patterns are often hard to detect. “I am not aware of any other tool with the same level of care and focus on providing a robust scientific assistant that researchers can rely on,” he says. “As the initial excitement around AI fades, I believe that tools that are ethical, trustworthy, and open will be the most valuable not only for science but also for society.”
Try it yourself
DataVoyager is now open to anyone—you’ll find it in the “Analyze data” tab in Asta.
In addition to releasing it publicly for the first time, we've made DataVoyager more transparent and easier to use. On the privacy front, uploaded datasets are now automatically deleted after seven days and are never used for model training—and you can delete them immediately at any time. We've also improved how DataVoyager communicates its work: every analysis now surfaces the reasoning, assumptions, and supporting code behind each result, and real-time status updates keep you informed throughout longer tasks. Smaller refinements round out the release, including clearer upload guidance and one-click code copying from the results panel.
Researchers can now bring DataVoyager into new domains anywhere they have structured data and questions that demand both trustworthiness and rigor. Instructors can build courses around it, using the tool as a scaffold for learning data analysis while still emphasizing critical thinking. And teams of all sizes have clearer deployment paths and documentation to get started.
Original post follows.
Scientists don't lack for questions they want answered—they lack hours in the day and tools they can trust to get those answers. Experimental logs live in spreadsheets; instrument readings arrive as CSVs; and results tables pile up across projects. Turning those structured files into answers takes time and often requires advanced programming skills to be done efficiently.
To fill the gap, we’re launching DataVoyager in Asta, our ecosystem for scientific research agents. Built to address the challenges scientists face in drilling down into structured datasets, Asta DataVoyager delivers data-driven discovery and analysis capabilities that allow you to make queries about structured files in plain language and get clearly cited, explainable answers with copyable code, clear visuals, and a concise, well-supported summary.
Our work with CAIA
The Cancer AI Alliance (CAIA) has already prototyped a federated instance of Asta DataVoyager on its platform. CAIA – which unites four leading cancer centers – today announced its federated learning platform that enables AI models to travel to each participating cancer center’s secure, de-identified data to learn from it locally, generating a summary of its learnings without individual clinical data ever leaving institutional firewalls. Ai2’s engagement with CAIA is made possible by generous support from Allen Family Philanthropies.
Read the CAIA announcement here.
CAIA’s custom version of Asta DataVoyager will allow approved scientists and researchers to query these data models in plain language and unlock insights from more diverse AI models trained on each cancer center’s data. CAIA can then run analyses and share de-identified results, unlocking insights without compromising data privacy or security.
As part of the initial project, researchers who are part of CAIA are preparing a lung cancer study using Asta DataVoyager to look at treatments and outcomes federated across the clinical data of multiple institutions, exploring factors like time to surgery with neoadjuvant chemo-immunotherapy, the impact of adding immunotherapy after definitive radiation, and targeted drugs versus standard platinum chemotherapy.
When complete, this prototype could yield novel, actionable real-world insights to improve patient care.
“We are excited about the possibility of providing powerful and secure analytics tools to cancer researchers who may not have AI expertise,” says Jeff Leek, PhD, VP and Chief Data Officer at Fred Hutch Cancer Center and Holder of the J. Orin Edson Foundation Endowed Chair. Leek is also the founder and scientific director of CAIA. “When I think about the future of where I want it to go, I think about this tool in the hands of clinicians, helping to answer important questions that will ensure the best possible care for cancer patients.”
CAIA will also compare the analysis code generated by Asta DataVoyager to code produced by expert biostatisticians, and use this to understand the quality of the machine-generated outputs.
Deep analysis for structured data
Asta DataVoyager is a trusted AI collaborator—one that lets researchers make queries about data in natural language and get transparent, reproducible answers they can act on. It was designed from the start to be intuitive for users, regardless of their comfort level working with dataset analysis tooling.
Users upload a dataset in CSV, Excel (.xlsx), JSON (.json/.jsonl), HDF5, TSV, or Parquet format and ask a question (e.g., “Which treatment arm shows the steepest improvement after week 6?”), along with an optional prompt to establish context (e.g., “use these units, measurement cadence, treatment conditions, and outcome variables”) so that Asta DataVoyager makes better initial choices.
Asta DataVoyager then outputs:
- A crisp answer to the user’s question, written for scientists
- Copyable visuals that make the finding understandable at a glance
- Copyable code that reproduces the analysis
- A methods section that documents assumptions, detailed reasoning steps, and statistical tests conducted—so users can cite the procedure or adapt it
Importantly, the output is structured and largely consistent across runs. That makes it easier to share with collaborators, copy to a lab notebook, or include in a preprint’s supplementary materials without much hand-reformatting.
Ask follow-ups (“Control for baseline weight,” “Perform non-parametric tests,” “What if we cap outliers at the 99th percentile?”), and Asta DataVoyager adds new cells to the report (equivalent to adding a cell in a Python notebook)—maintaining provenance as the analysis evolves.
Trustworthy tools for research
Asta DataVoyager is a starting point for trustworthy, conversational analysis. It allows teams to stay in full control of their data—they can delete datasets at any time from Asta’s hosted portal or secure on-premises, datacenter, and private cloud deployments.
Data handling is explicit and user-defined. Scientists can install Asta DataVoyager on their own infrastructure or private server and configure the system so that data remains in their purview. That’s critical for teams using sensitive or proprietary data, or operating under clinical or agency constraints.
Asta DataVoyager supercharges your statistical toolkit. It’s about shortening the path from a question to a well-supported answer—and making each step visible so your collaborators, reviewers, and you can understand and reliably trust the result.
Reach out to the Asta team to discuss secure deployments and pilot projects, and sign up for updates below.
In early pilots and collaborations, Asta DataVoyager is already helping drive science forward. We look forward to hearing what Asta’s new features enable for you—and suggestions to make them even better.