A computer generated image showing swaths of shapes, meant to depict a hopeful futuristic feeling.

Open models - Multimodal models

Our work in multimodal AI has been groundbreaking from the beginning, and we’re continuing to push the boundaries of what these models can do. The potential of multimodal models for higher accuracy and more complete context makes this an exciting and rapidly evolving frontier for AI.

An abstract image of 3D shapes layered on each other.

Featured model - Molmo

Molmo is a family of open state-of-the-art multimodal AI models. Our most powerful model closes the gap between open and proprietary systems across a wide range of academic benchmarks as well as human evaluations. Our smaller models outperform models 10x their size.

Try Molmo in the Ai2 playground Read about Molmo on the Ai2 blog

Unified-IO

The first general-purpose neural model to perform a large and diverse set of AI tasks spanning classical computer vision, image synthesis, vision-and-language, and natural language processing (NLP).

Explore Unified-IO

An abstract illustration of swirling shapes, meant to provoke a futuristic feeling.

An abstract image of flat planes twisting inward.

Unified-IO-2

The first autoregressive multimodal model capable of understanding and generating images, text, audio, and action, achieving state-of-the-art performance on the GRIT benchmark and strong results in more than 30 benchmarks.

Explore Unified-IO-2

VisProg

A modular and interpretable neuro-symbolic vision system that can decompose natural language instructions into a sequence of steps and then use existing pretrained neural models, image processing subroutines, or arithmetic and logical operations to execute these steps.

Explore VisProg

Abstract image of leaves bending over one another.