Skip to main content ->
Ai2

Ai2 Newsletter

April 2026

Top story - Introducing MolmoWeb—An open agent for the web

MolmoWeb is our new open source agent that can navigate and complete tasks in a web browser on your behalf. Built on Molmo 2 in 4B and 8B sizes, it sets a new open-weight state-of-the-art across four major web-agent benchmarks and even surpasses agents built on proprietary models.

The web is the world's largest software platform. Agents that can navigate it reliably could dramatically expand access to information and digital services. But most web agents are closed, and the ones that work well have been historically built on proprietary models.

MolmoWeb works by looking at the same screen you do: given a task and a live webpage, it views the screenshot, decides what to do next, and takes action. Because it operates on screenshots rather than underlying page code, it won't break when a website changes its HTML.

MolmoWeb outperforms all open-weight models on every benchmark we tested, and even surpasses visual agents built on much larger proprietary models like GPT-4o-based SoM Agents. It also beats OpenAI CUA on 3 out of 4 benchmarks.

Performance improves further when the model gets multiple attempts at a task. On two benchmarks (WebVoyager and Online-Mind2Web), MolmoWeb with 4 parallel attempts surpasses the best single-attempt performance of every model we evaluated, including agents powered by GPT-5 and Gemini CU Preview.

Alongside the model, we released MolmoWebMix, a comprehensive dataset for training web agents. Everything needed to inspect, reproduce, and fine-tune MolmoWeb is openly available.

MolmoPoint

Instead of generating coordinates as text, MolmoPoint lets models point by directly selecting parts of their visual input, making grounding simpler, faster, and more accurate. MolmoPoint-8B sets a new state of the art on PointBench, and MolmoPoint-GUI-8B reaches state-of-the-art among open models for GUI grounding. Three models, two new open datasets, and all training code are available now.

MolmoBot

MolmoBot is an open model suite for robotic manipulation trained entirely in simulation on MolmoSpaces, demonstrating zero-shot transfer to real-world robots without any real-world data collection. In evaluations, it outperforms comparable models on totally unseen environments, and simulation data scales effectively as object and environment diversity increases.

VLA Evaluation Harness

Evaluating VLA models has meant maintaining private eval forks per benchmark, with results that often diverge and take days to reproduce. Our new vla-evaluation-harness standardizes everything: benchmarks run inside Docker, model servers are single-file scripts, and episode sharding with batched GPU inference turns a 14-hour eval run into 18 minutes on a single H100.

SQA Eval

In a new paper, we put ScholarQA and other deep research systems under the microscope with PhD-level experts and found that pairwise preferences work for ranking systems overall but are unreliable for instance-level conclusions or evaluating specific quality metrics like citation validity. Building better research agents requires moving beyond shallow evaluation toward more targeted, metric-specific approaches.

Ai2 at NVIDIA GTC 2026

Ai2 had a strong presence at NVIDIA GTC 2026 in San Jose this March, contributing to major conversations, panels, and demos about open frontier models, open-source AI for science and what it takes to build trustworthy, scalable, production-ready open ecosystems.

For a full recap of our activities at GTC, check out our blog.

    Ai2 Newsletter Archive