Ai2 Newsletter

March 2025

Ai2 OLMoE: Fully open source. Running entirely on-device. Bottom left shows the app interface on iPad and iPhone. On the right shows Luca Soldaini, lead researcher on the OLMoE app.

Top story - OLMoE, an open-source iOS app

As phones get faster, more AI will happen on device. That's why we released a fully open iOS app that allows anyone to test our models on their devices privately and securely. OLMoE can help researchers in studying how to make on-device models better, and enable developers to prototype new AI experiences. We also refreshed the model using OLMo 2 mid-training and Tülu 3 post-training recipes!

Due to hardware limitations, this version of OLMoE requires an iPhone 15 Pro or newer devices, or any M-series iPad. Read the blog for more behind-the-scenes and links to code.

Download the app Watch demo video

Screenshot of the olmOCR demo. A handwritten note on the left, with html text on the right, reflecting the handwritten text, with page metadata including primary language, rotation, table, diagram and so on.

olmOCR converts PDFs into plain text

PDFs are notoriously difficult to extract text from. Standard tools struggle with reading order, often jumbling text or missing content entirely, especially with scanned documents and handwritten text. We trained olmOCR on academic papers, technical documentation, and other reference content, and use a unique prompting technique to increase accuracy and decrease hallucinations. It outperforms leading PDF tools with a computing cost 1/32 of GPT-4o, and is open-source! Read the blog for how we built and evaluated olmOCR.

Try olmOCR

A decade of keeping wildlife and people safe

Over the past ten years, EarthRanger has evolved from an ambitious idea into a widely adopted platform that helps conservation organizations worldwide. But today, the focus isn’t on technology–it’s on the people. Rangers, ecologists, and park managers work tirelessly to safeguard nature, promote human-wildlife coexistence, and–just as importantly–ensure the safety of those on front lines.

Read the blog

Average performance on 7 text-rich benchmarks: ChartQA, DocVQA, InfoVQA, TableVQA, AI2D, TextVQA, ScreenQA.

Code-guided synthetic data generation

Reasoning about images with rich text, such as charts and documents, is a critical application of vision-language models (VLMs). However, VLMs often struggle in these domains due to the scarcity of diverse text-rich vision-language data. We present CoSyn, a framework that leverages the coding capabilities of text-only LLMs to automatically create synthetic text-rich multimodal data.

More from us

Ai2 Newsletter Archive