Documentation
Getting started with OLMo
OLMo is a series of Open Language Models designed with access to the data, training code, models, and evaluation code necessary to advance AI and study language models collectively.
Visit the Ai2 Playground to interact with OLMo. Follow this guide to run OLMo 2 on your local device.
Prerequisites:
- Transformers from git commit 3cb8676
- Torch version 2.5.1 or newer
- Python version 3.8 or newer
In this example, we’ll have OLMo generate a completion for the prompt, “San Francisco is a”
Step 1:
To run OLMo locally, install the latest huggingface-transformers in a new Python environment:
pip install -U transformers git+https://github.com/huggingface/transformers.git@3cb8676#egg=transformers torch
If running on cpu, install
accelerate
:
pip install -U 'accelerate>=0.26.0'
Step 2:
Load the model and run inference using huggingface-transformers:
from transformers import AutoModelForCausalLM, AutoTokenizer import torch olmo = AutoModelForCausalLM.from_pretrained( "shanearora/OLMo-7B-1124-hf", torch_dtype=torch.float32, device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained("shanearora/OLMo-7B-1124-hf") message = ["San Francisco is a"] inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False) response = olmo.generate( **inputs, max_new_tokens=128, do_sample=True, top_k=50, top_p=0.95, temperature=0.5 ) print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
This should print a result similar to: San Francisco is a beautiful city, and I love it, but it is also a very expensive city, and I don’t make a lot of money.
For more information such as model evaluation metrics, model variants, and details about the model architecture, visit the OLMo 2 13B Instruct Hugging Face model card page.
Getting started with OLMoE-Mix and Dolmino
OLMoE-Mix is the pretraining dataset for OLMo. OLMoE-mix-0924 is a truly open dataset of 4.07 trillion tokens from a diverse mix of web content, academic publications, code, books, and encyclopedic materials. OLMoE-Mix is used during phase 1 of the OLMo-1124’s long pretraining. OLMo-1124 is then annealed on Dolmino.
Prerequisites:
- Huggingface datasets library (pip install datasets)
- Python version 3.8 or newer
from datasets import load_dataset dataset = load_dataset("allenai/OLMoE-mix-0924", split="train")
Now you can use the
dataset
like any other Hugging Face dataset!
For more information such as a breakdown of what content is included in OLMoE-Mix and how it is preprocessed, visit OLMoE-mix’s Hugging Face dataset card page.
Getting started with Tülu 3
Tülu 3 is a top-performing instruction model family with fully open-source data, code, and recipes designed to serve as a comprehensive guide for modern post-training techniques.
Models based on the best performing Tülu 3 recipes deliver evaluation results on par with or surpassing those of Llama 3.1 models, Qwen 2.5, Mistral, and approach the performance of GPT-4. Along with Tülu 3, we released a multi-task evaluation setup which leverages a set of unseen evaluation benchmarks as well as standard benchmark implementations, and substantially decontaminated versions of existing open datasets.
Visit the Ai2 Playground to interact with Tülu 3. Follow this guide to run llama-tulu-3, the current version available in the Playground, on your local device.
Prerequisites:
- Transformers versions v4.45.0 or newer
- Python version 3.8 or newer
In this example, we’ll have llama-tulu-3 generate a response to the query, “What is language modeling?”
Step 1:
To run llama-tulu-3 locally, install huggingface-transformers (at least version 4.45.0) in a new Python environment:
pip install -U transformers
If running on cpu, install
accelerate
:
pip install -U 'accelerate>=0.26.0'
Step 2:
Load the model and run inference using huggingface-transformers:
import transformers import torch model_id = "allenai/llama-tulu-3-8b" pipeline = transformers.pipeline( "text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto", ) messages = [ {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"}, {"role": "user", "content": "Who are you?"}, ] outputs = pipeline( messages, max_new_tokens=256, ) print(outputs[0]["generated_text"][-1])
For more information such as model evaluation metrics, model variants, and details about the model architecture, visit the llama-tulu-3-8b Hugging Face model card page.
Getting started with tulu-v3-sft-mixture
Tulu-v3-sft-mixture is the pretraining dataset for llama-tulu-3. Tulu-v3-sft-mixture is a truly open dataset of 4.07 trillion tokens from a diverse mix of web content, academic publications, code, books, and encyclopedic materials. We encourage everyone to independently create better versions of Tulu-v3-sft-mixture!
Prerequisites:
- Huggingface datasets library (pip install datasets)
- Python version 3.8 or newer
from datasets import load_dataset dataset = load_dataset("allenai/tulu-v3-sft-mixture", split="train")
Now you can use the
dataset
like any other Hugging Face dataset!
For more information such as a breakdown of what content is included in Tulu-v3-sft-mixture and how it is preprocessed, visit Tulu-v3-sft-mixture’s Hugging Face dataset card page.