Ai2 Newsletter
July 2026
Top story - Introducing MolmoMotion—an open model that predicts where objects will move
This month, we released MolmoMotion, an open vision-language model that takes a few seconds of video and predicts where the objects in it will move next in 3D. MolmoMotion forecasts the 3D paths of points you mark on an object, conditioned on a plain-language instruction like "move and rotate the wooden bowl."
MolmoMotion is open source, including the model weights, training code, and MolmoMotion-1M, its training set and the largest collection of action-described 3D point trajectories we know of, drawn from 1.16 million videos. Built on the Molmo 2 backbone, MolmoMotion extends our Molmo line from describing and pointing at objects in a scene to forecasting how those objects will behave.

On PointMotionBench – our new human-validated benchmark of held-out 3D trajectories, released alongside MolmoMotion – MolmoMotion is the most accurate 3D forecaster we measured ahead of every video-generation, parametric-3D, and constant-velocity baseline we tested. Those forecasts pay off in downstream applications: a robot can learn manipulation tasks from far fewer demonstrations, and the same predictions can steer a video generator to improve the quality of its outputs.
We think learning to forecast motion is a promising direction for embodied and generative AI, and we look forward to seeing what the community builds on what we've released.
OlmoEarth v1.2
OlmoEarth v1.2, the latest in our family of open Earth observation models, improves performance and efficiency by swapping absolute positional encodings for rotary ones, which removes unwanted striping artifacts in the embeddings; we recommend it as a drop-in replacement for v1.
olmo-eval
olmo-eval, our new model evaluation framework and the successor to OLMES, is built for the day-to-day loop of developing a modern LLM. It allows you to easily add a benchmark, re-run it across changing checkpoints, and swap the model, its tools, the environment, and any judge or grading model as fully independent components.
AutoDiscovery access extended
Early access to AutoDiscovery – our agent for autonomous scientific hypothesis generation in Asta – is now extended through July 31. Point AutoDiscovery at your dataset and it generates and tests hypotheses on its own, ranking findings by Bayesian surprise—how much each result disagrees with the expected pattern in the data.
ModSleuth
Modern open models are built on long chains of other models and datasets, and those dependencies are hard to trace by hand. ModSleuth reconstructs them automatically: starting from public model cards, dataset cards, and reports, it builds an evidence-grounded dependency graph for a release.
ACE2S-SHiELD+
ACE2S-SHiELD+ is the newest member of our open climate emulator family, built to simulate atmospheric variability from days to centuries. It adds the ability to separate the independent effects of sea-surface temperature and CO2 while running much faster than its predecessor, SHiELD.
DiScoFormer
DiScoFormer is a transformer that estimates both the probability density and the score of a distribution from a sample of points, in a single forward pass and with no retraining for each new distribution.
Global Nature Positive Summit 2026
We'll be at the Global Nature Positive Summit in Kumamoto, Japan, July 14–16, to talk about OlmoEarth—our open, end-to-end platform for Earth intelligence. OlmoEarth integrates satellite, radar, and map data to turn raw Earth observations into timely, trustworthy, and actionable insights, from local to global scale. Ted Schmitt, our Senior Director of Conservation, will share how we're putting it to work with the Global Ecosystems Atlas—part of a broader effort to make planetary intelligence open and accessible to every mission-driven organization.
The Summit convenes governments, companies, and conservation groups around the global commitment to halt and reverse nature loss by 2030.
The Multimodal Digital Agents workshop at ECCV 2026
We're co-organizing the first Workshop on Multimodal Digital Agents at ECCV 2026 in Malmö, Sweden, on September 8—a full day on vision-centric agents that perceive and act across web, desktop, and mobile interfaces, spanning perception, reasoning, benchmarks, and safety. Speakers include Graham Neubig, Ranjay Krishna, Qianhui Wu, and Alexandre Drouin, and there's a best-paper award.
Full papers and extended abstracts are due July 29—submit at mda-workshop.allen.ai.