Documentation
Getting started with Tülu 3
Tülu 3 is a top-performing instruction model family with fully open-source data, code, and recipes designed to serve as a comprehensive guide for modern post-training techniques.
Tülu 3 405B, the newest member of the Tülu family, demonstrates the scalability and effectiveness of our Tülu 3 post-training recipe to Llama-405B, achieving competitive or superior performance to both DeepSeek v3 and GPT-4o, while surpassing prior open-weight post-trained models of the same size including Llama 3.1 405B Instruct on many standard benchmarks.
Along with Tülu 3, we released a multi-task evaluation setup which leverages a set of unseen evaluation benchmarks as well as standard benchmark implementations, and substantially decontaminated versions of existing open datasets.
Visit the Ai2 Playground to interact with Tülu 3. Follow this guide to run llama-tulu-3 on your local device.
Prerequisites:
- Transformers versions v4.45.0 or newer
- Python version 3.8 or newer
In this example, we’ll have llama-tulu-3 generate a response to the query, “What is language modeling?”
Step 1:
To run llama-tulu-3 locally, install huggingface-transformers (at least version 4.45.0) in a new Python environment:
pip install -U transformers
If running on cpu, install
accelerate
:
pip install -U 'accelerate>=0.26.0'
Step 2:
Load the model and run inference using huggingface-transformers:
import transformers import torch model_id = "allenai/llama-tulu-3-8b" pipeline = transformers.pipeline( "text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto", ) messages = [ {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"}, {"role": "user", "content": "Who are you?"}, ] outputs = pipeline( messages, max_new_tokens=256, ) print(outputs[0]["generated_text"][-1])
For more information such as model evaluation metrics, model variants, and details about the model architecture, visit the Tülu 3 family on Hugging Face.
Getting started with OLMoE
OLMoE, a member of the OLMo family, is the first good Mixture-of-Experts LLM that is 100% open-source. The model has 1B active parameters, and 7B total parameters and is trained for a total of 5T tokens. Performance-wise, OLMoE is the state of the art among models with a similar cost of 1B parameters and is competitive with larger models like Llama2-13B.
Visit the Ai2 Playground to interact with OLMoE. Follow this guide to run OLMoE-1B-7B, the current version available in the Playground, on your local device.
Prerequisites:
- Transformers versions v4.45.0 or newer
- Python version 3.8 or newer
In this example, we’ll have OLMoE generate a response to the query, “What is Bitcoin?”
Step 1:
To run OLMoE locally, install huggingface-transformers (at least version 4.45.0) in a new Python environment:
pip install -U transformers
Step 2:
Load the model and run inference using huggingface-transformers:
from transformers import OlmoeForCausalLM, AutoTokenizer import torch DEVICE = "cuda" if torch.cuda.is_available() else "cpu" # Load different ckpts via passing e.g. `revision=step10000-tokens41B` model = OlmoeForCausalLM.from_pretrained("allenai/OLMoE-1B-7B-0125").to(DEVICE) tokenizer = AutoTokenizer.from_pretrained("allenai/OLMoE-1B-7B-0125") inputs = tokenizer("What is Bitcoin?", return_tensors="pt") inputs = {k: v.to(DEVICE) for k, v in inputs.items()} out = model.generate(**inputs, max_length=64) print(tokenizer.decode(out[0])) >> 'Bitcoin is a digital currency that is created and held electronically. No one controls it. Bitcoins aren’t printed, like dollars or euros – they’re produced by people and businesses running computers all around the world, using software that solves mathematical'
If you face any issues with “model not found”, OLMoE will be supported in the next version of Transformers. In the meantime, install it from the main branch using:
pip install --upgrade git+https://github.com/huggingface/transformers.git
For more information such as model evaluation metrics, model variants, and details about the model architecture, visit the OLMoE 1B-7B 0125 Hugging Face model card.
Getting started with OLMo
OLMo is a series of Open Language Models designed with access to the data, training code, models, and evaluation code necessary to advance AI and study language models collectively.
Visit the Ai2 Playground to interact with OLMo. Follow this guide to run OLMo 2 on your local device.
Prerequisites:
- Transformers from git commit 3cb8676
- Torch version 2.5.1 or newer
- Python version 3.8 or newer
In this example, we’ll have OLMo generate a completion for the prompt, “San Francisco is a”
Step 1:
To run OLMo locally, install the latest huggingface-transformers in a new Python environment:
pip install -U transformers git+https://github.com/huggingface/transformers.git@3cb8676#egg=transformers torch
If running on cpu, install
accelerate
:
pip install -U 'accelerate>=0.26.0'
Step 2:
Load the model and run inference using huggingface-transformers:
from transformers import AutoModelForCausalLM, AutoTokenizer import torch olmo = AutoModelForCausalLM.from_pretrained( "shanearora/OLMo-7B-1124-hf", torch_dtype=torch.float32, device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained("shanearora/OLMo-7B-1124-hf") message = ["San Francisco is a"] inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False) response = olmo.generate( **inputs, max_new_tokens=128, do_sample=True, top_k=50, top_p=0.95, temperature=0.5 ) print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
This should print a result similar to: San Francisco is a beautiful city, and I love it, but it is also a very expensive city, and I don’t make a lot of money.
For more information such as model evaluation metrics, model variants, and details about the model architecture, visit the OLMo 2 13B Instruct Hugging Face model card page.