Skip to main content ->
Ai2

Latest research

September 2025 - Fluid language model benchmarking

We explore how Fluid Benchmarking can adapt evaluation items to a language model’s capability level.

August 2025 - OLMoASR: A series of open speech recognition models

We release OLMoASR, a family of open automatic speech recognition (ASR) models trained from scratch on a curated,…

August 2025 - Asta: Accelerating science through trustworthy agentic AI

We announce Asta, our bold initiative to accelerate science through trustworthy, truly open agentic AI.

August 2025 - AstaBench: Rigorous benchmarking of AI agents with a holistic scientific research suite

Introducing AstaBench, a novel AI agents evaluation framework and scientific research benchmark suite.

August 2025 - MoNaCo: More natural questions for reasoning across dozens of documents

Introducing MoNaCo, a benchmark of highly challenging questions spanning dozens of documents for evaluating large…

August 2025 - MolmoAct: An Action Reasoning Model that reasons in 3D space

MolmoAct is the first model able to “think” in three dimensions, trained efficiently and delivering…

July 2025 - SciArena: A new platform for evaluating foundation models in scientific literature tasks

Discover how SciArena is being used to evaluate foundation models’ capabilities in scientific literature tasks…

June 2025 - OMEGA: Can LLMs reason outside the box in math?

Discover how OMEGA is being used to evaluate large language models' ability to generalize in math through…

June 2025 - New applications of the Ai2 Climate Emulator (ACE) by the international climate modeling community

Learn how ACE is being used for seasonal forecasts and understanding decadal variations in global warming.