Skip to main content ->
Ai2

Latest research

August 12, 2025

MolmoAct: An Action Reasoning Model that reasons in 3D space

MolmoAct is the first model able to “think” in three dimensions, trained efficiently and delivering benchmark-topping performance.
Read post
July 22, 2025

Contextualized Evaluations: Judging language model responses to underspecified queries

How do we evaluate LLMs on underspecified queries? We show that adding clarifying context flips model rankings and uncovers model biases.
Read post
July 18, 2025

AutoDS: A prototype engine for autonomous, open-ended scientific discovery

AutoDS goes beyond standard data crunching by building upon its own findings and uncovering insights that may not be immediately apparent even to experienced researchers.
Read post
July 9, 2025

Introducing FlexOlmo: a new paradigm for language model training and data collaboration

Explore how FlexOlmo enables collaborative language model training without sacrificing data privacy or control, introducing a new, flexible approach to building shared AI models.
Read post
July 1, 2025

SciArena: A new platform for evaluating foundation models in scientific literature tasks

Discover how SciArena is being used to evaluate foundation models’ capabilities in scientific literature tasks through community-driven, literature-grounded, and multi-disciplinary reasoning.
Read post
June 24, 2025

OMEGA: Can LLMs reason outside the box in math?

Discover how OMEGA is being used to evaluate large language models' ability to generalize in math through exploratory, compositional, and transformative reasoning
Read post
June 13, 2025

New applications of the Ai2 Climate Emulator (ACE) by the international climate modeling community

Learn how ACE is being used for seasonal forecasts and understanding decadal variations in global warming.
Read post
June 3, 2025

Revisiting critical batch size for large-batch OLMo pretraining

We introduce a more reliable method to measure the critical batch size (CBS), analyze how CBS changes over training, and use this to train OLMo with fewer grad steps.
Read post
April 28, 2025
Read post