Open models - Multimodal models
Our work in multimodal AI has been groundbreaking from the beginning, and we’re continuing to push the boundaries of what these models can do. The potential of multimodal models for higher accuracy and more complete context makes this an exciting and rapidly evolving frontier for AI.
Featured model - Molmo
Molmo is a family of open state-of-the-art multimodal AI models. Our most powerful model closes the gap between open and proprietary systems across a wide range of academic benchmarks as well as human evaluations. Our smaller models outperform models 10x their size.
Unified-IO
The first general-purpose neural model to perform a large and diverse set of AI tasks spanning classical computer vision, image synthesis, vision-and-language, and natural language processing (NLP).
Unified-IO-2
The first autoregressive multimodal model capable of understanding and generating images, text, audio, and action, achieving state-of-the-art performance on the GRIT benchmark and strong results in more than 30 benchmarks.
VisProg
A modular and interpretable neuro-symbolic vision system that can decompose natural language instructions into a sequence of steps and then use existing pretrained neural models, image processing subroutines, or arithmetic and logical operations to execute these steps.