Latest research
October 1, 2025
Asta DataVoyager: Data-driven discovery and analysis
DataVoyager is our new feature in Asta built to address the challenges scientists face in drilling down into structured datasets.September 16, 2025
Fluid language model benchmarking
We explore how Fluid Benchmarking can adapt evaluation items to a language model’s capability level.August 28, 2025
OLMoASR: A series of open speech recognition models
We release OLMoASR, a family of open automatic speech recognition (ASR) models trained from scratch on a curated, large-scale dataset.August 26, 2025
Asta: Accelerating science through trustworthy agentic AI
We announce Asta, our bold initiative to accelerate science through trustworthy, truly open agentic AI.August 26, 2025
AstaBench: Rigorous benchmarking of AI agents with a holistic scientific research suite
Introducing AstaBench, a novel AI agents evaluation framework and scientific research benchmark suite.August 19, 2025
Signal and Noise: Reducing uncertainty in language model evaluation
We find that two simple metrics, signal and noise, reveal key differences in the utility of current LLM benchmarks.August 18, 2025
MoNaCo: More natural questions for reasoning across dozens of documents
Introducing MoNaCo, a benchmark of highly challenging questions spanning dozens of documents for evaluating large language models.August 12, 2025
MolmoAct: An Action Reasoning Model that reasons in 3D space
MolmoAct is the first model able to “think” in three dimensions, trained efficiently and delivering benchmark-topping performance.July 22, 2025