OLMoE, meet iOS
Ai2 OLMoE is now available as an on-device, state-of-the-art open-source model.
February 11, 2025
At Ai2, we are committed to building the best fully open language models. Today, we are taking a significant step in expanding the definition of fully open by releasing a fully open iOS app that allows anyone to test our models on their devices privately and securely. Our app is fully open source; it will help researchers in studying how to make on-device models better, and enable developers to prototype new AI experiences.
Download the app today from the Apple App Store, or build it from source code from our repository! Due to hardware limitations, the first version of OLMoE requires an iPhone 15 Pro or newer devices, or any M-series iPad.
Expanding our truly-open ecosystem
From the very beginning, every model in the OLMo family is designed to be fully open: we release all software, data, and artifacts that contribute to the final model weights. However, that's only part of the journey for any language model — we need equally open solutions to make AI accessible to users. Open source projects like vLLM and SGLang help developers deploy LLMs on a broad range of cloud servers, while Ollama and LM Studio let users experience open-weights models directly on their computers.
With capabilities of smaller models rapidly advancing (in late 2024, 7B models easily surpass performance of state-of-the-art models launched in the year prior), and mobile processing units getting faster, on-device AI will see more adoption in the future.
OLMoE is a fully open source toolkit for researchers and developers to experiment with on-device AI. It can be used to:
- Experience which real-world tasks state-of-the-art on-device models are capable of;
- Research how to improve efficient local AI models;
- Test your own model locally using our open-source codebase;
- Integrate OLMoE in other iOS applications.
One key advantage of models like OLMoE are fully private: prompts and responses never leave your device. Because they do not require an internet connection, they work reliably no matter where users are.
From model to application
To build this application, we combined our best fully open recipes. The starting point is OLMoE, our most efficient, fully-open language model. We created a new version of this model, allenai/OLMoE-1B-7B-0125-Instruct: by using the Dolmino mix introduced in OLMo 2 for mid-training, and the Tülu 3 post-training recipe, OLMoE is, on average 35% better on our evaluation suite while being as efficient as the original release.
To run on-device, we reduce the size of OLMoE using Q4_K_M quantization, with minimal impact on model performance (as an example, IFEval scores drop from 66.4 to 63.6). If you are interested in testing this new OLMoE model before quantization, you can do so on the Ai2 Playground. Quantized models in GGUF format are available on HuggingFace (base and instruct).
We built the Ai2 OLMoE app in partnership with GenUI on top of fantastic open source projects: starting from Swift bindings for Llama.cpp, we optimized our stack to achieve 41 tokens/s on average on iPhone 16 Pro.
Finally, the code of our app is fully open source — ready for AI researchers and developers to adopt. For example, it can be used as a scaffold to evaluate more efficient on-device AI models; alternatively, our model implementation can be incorporated into other applications.
We see this as a foundational step towards the future of on-device functionality. As mobile devices continue to increase in power and performance, we hope that the OLMoE app can help researchers and developers keep up with the cutting edge.
Give OLMoE a try on your iPhone today!
Join us on Discord
Join Ai2's Discord server to talk about our OLMoE app or any of our open models, share your projects, and connect with other researchers and developers working on truly open AI.