Skip to main content ->
Ai2

Open models - Multimodal models

Our work in multimodal AI has been groundbreaking from the beginning, and we’re continuing to push the boundaries of what these models can do. The potential of multimodal models for higher accuracy and more complete context makes this an exciting and rapidly evolving frontier for AI.

Unified-IO

The first general-purpose neural model to perform a large and diverse set of AI tasks spanning classical computer vision, image synthesis, vision-and-language, and natural language processing (NLP).

Unified-IO-2

The first autoregressive multimodal model capable of understanding and generating images, text, audio, and action, achieving state-of-the-art performance on the GRIT benchmark and strong results in more than 30 benchmarks.

VisProg

A modular and interpretable neuro-symbolic vision system that can decompose natural language instructions into a sequence of steps and then use existing pretrained neural models, image processing subroutines, or arithmetic and logical operations to execute these steps.