Skip to main content ->
Ai2

Latest research

October 1, 2025

Asta DataVoyager: Data-driven discovery and analysis

DataVoyager is our new feature in Asta built to address the challenges scientists face in drilling down into structured datasets.
Read post
September 16, 2025

Fluid language model benchmarking

We explore how Fluid Benchmarking can adapt evaluation items to a language model’s capability level.
Read post
August 28, 2025

OLMoASR: A series of open speech recognition models

We release OLMoASR, a family of open automatic speech recognition (ASR) models trained from scratch on a curated, large-scale dataset.
Read post
August 26, 2025

Asta: Accelerating science through trustworthy agentic AI

We announce Asta, our bold initiative to accelerate science through trustworthy, truly open agentic AI.
Read post
August 26, 2025

AstaBench: Rigorous benchmarking of AI agents with a holistic scientific research suite

Introducing AstaBench, a novel AI agents evaluation framework and scientific research benchmark suite.
Read post
August 19, 2025

Signal and Noise: Reducing uncertainty in language model evaluation

We find that two simple metrics, signal and noise, reveal key differences in the utility of current LLM benchmarks.
Read post
August 18, 2025

MoNaCo: More natural questions for reasoning across dozens of documents

Introducing MoNaCo, a benchmark of highly challenging questions spanning dozens of documents for evaluating large language models.
Read post
August 12, 2025

MolmoAct: An Action Reasoning Model that reasons in 3D space

MolmoAct is the first model able to “think” in three dimensions, trained efficiently and delivering benchmark-topping performance.
Read post
July 22, 2025

Contextualized Evaluations: Judging language model responses to underspecified queries

How do we evaluate LLMs on underspecified queries? We show that adding clarifying context flips model rankings and uncovers model biases.
Read post