Embodied AI
We lead cutting-edge research to develop the next generation of intelligent robots, safely trained in advanced simulation environments to automate routine tasks and improve daily life. All of our discoveries are open sourced, enabling the community to build on our progress and collaboratively shape the future of embodied AI.
Advancing vision language models for robotics
Our vision-language models (VLMs) are built to go beyond understanding text – they are designed to perceive and interact with the physical world. Trained on tasks grounded in robotics, these models emphasize spatial reasoning and the ability to interpret and point to objects in their environment. Our data collection is purposeful and focused on real-world utility. The result is Molmo, a family of open, state-of-the-art multimodal models that match or exceed the performance of proprietary systems. On both academic benchmarks and human evaluations, Molmo consistently outperforms models up to 10× its size. By releasing these models openly, we enable the research community to innovate, iterate, and build on transparent, adaptable foundations.
Training at massive scale in simulations
We train policies for embodied AI at massive scale in simulation, leveraging procedural generation and a 10M+ library of 3D assets to create a stunning diversity of virtual environments. Award-winning tools like ProcTHOR, Objaverse, Objaverse-XL, and Holodeck enable exponentially greater diversity in training environments. Massive scale and visual diversity enables us to train generalizable policies for zero-shot real transfer.
Zero-shot real transfer
We’re developing new ways for robots to learn how to navigate in real-world environments using training done entirely in simulation. By removing the need for costly real-world data collection, our approach makes it much easier to scale up embodied AI. Projects like SPOC, PoliFormer, and FLaRe are leading the way—showing that simulation-trained robots can successfully operate in unfamiliar, real-world spaces.
Our teams continue to push the boundaries of embodied AI, focusing on open data and platforms that support training and experimentation across the broader community.
Ai2-THOR
AI2-THOR is an open‑source simulation platform designed for embodied AI and robotics research that provides near photo‑realistic 3D indoor environments in which virtual agents can navigate, interact with objects, and learn from their actions.
GraspMolmo
GraspMolmo is an open-source AI model that helps robots understand everyday language to pick things up in smart, task-aware ways—like grabbing a teapot by the handle when asked to pour tea. Trained on a massive synthetic dataset, it outperforms previous systems and even handles complex, cluttered scenes.
The One RING
Modern robots vary significantly in style and ability, but most navigation policies are trained on only one robot and fail to generalize to another. RING (Robotic Indoor Navigation Generalist) is a novel embodiment-agnostic policy that turns any mobile robot into an effective indoor semantic navigator.