February 2023 — Special Edition

Welcome, OLMo 7B!

Top Story

OLMo: Open Language Model, A State-Of-The-Art, Truly Open LLM and Framework

As the world races to deploy AI models that are effective and safe, the demand for Open Large Language Models (LLMs) has exploded. The massive adoption of both open and closed AI models means that AI capabilities have leapfrogged our ability to understand how they are created. AI2 has released OLMo 7B, a truly open, state-of-the-art large language model released alongside the pre-training data and training code. This empowers researchers and developers to use the best and open models to advance the science of language models collectively.

Learn more ➞

Visit the microsite ➞

Dolma, OLMo's 3 Trillion Token Corpus

OLMo is built on AI2’s Dolma set which features a three trillion token open corpus for language model pretraining, built from a diverse mixture of web content, scientific papers, code, public-domain books, social media, and encyclopedic materials. In the new paper, researchers document Dolma, including its design principles, details about its construction, and a summary of its contents. They also open source a high-performance curation toolkit to reproduce Dolma and curate other datasets.

Read the paper ➞

Access the toolkit ➞

AI2’s new open-source LLM may reset the definition of ‘open AI’

"Being a researcher in the AI field and just working with APIs or closed models is like being an astronomer trying to research the Solar System and only having access to pictures of it from the newspaper,” says Hanna Hajishirzi, Senior Director of AllenNLP and one of the primary researchers behind OLMo. Open research will remove silos and improve efficiency in the AI research community.

Read the Fast Company article ➞

More About OLMo

➞ AI2 open sources text-generating AI models — and the data used to train them

➞ Allen Institute for AI releases fully open-source large language model

➞ Allen Institute for AI releases ‘truly open source’ LLM to drive ‘critical shift’ in AI development

➞ Allen Institute for AI promises new insights into large language models with OLMo release

Work with us

AI2 Newsletter archive

Allen Institute for AI, 2157 N Northlake Way, Suite 110, Seattle, WA 98103, USA