Skip to main content ->
Ai2

Ai2 Newsletter

June 2025

Top story - RewardBench 2

We recently launched RewardBench 2, an updated benchmark designed to better evaluate reward models using more challenging, accuracy-focused data. We took a long time to learn from our first reward model evaluation tool to make one that is substantially harder and more correlated with both downstream RLHF and inference-time scaling, allowing for more rigorous and reliable evaluation.

AMA

Last month – a handful of our researchers gathered to answer questions live on Reddit r/ huggingface. It was an exciting opportunity to hear about what people are doing with our models and what they are hoping we’ll do next. Some of the questions we got:

  • What do you think is the biggest challenge when building a fully open sourced model compared to a closed one?
  • What are the preferred ways for developers to approach the Ai2 researchers to discuss coding with OLMo? Obviously, there are Ai2's GitHub repos and the Ai2 Discord. Are there any additional non-obvious channels?
  • What are some interesting things you’ve learned using OLMoTrace?

To find out the answers, check out the Reddit thread with 100+ comments below!

EmTech AI

How do we build AI systems that are safe, transparent, and worthy of public trust?

CEO, Ali Farhadi, took the stage at EmTech AI 2025 to explain how transparency at every stage—how we build, share, and govern AI systems—must be open by default.

In his session, "Building Confidence in AI Through Transparency," he examined the challenges of building trustworthy AI models and how business leaders and researchers can ensure AI systems operate as expected while continuing to evolve responsibly.

New Research

Do LLMs learn language via rules or analogies?

This could be a surprise to many – models rely heavily on stored examples and draw analogies when dealing with unfamiliar words, much as humans do. Check out this new study to learn how they made the discovery.

More from us

Ai2 Newsletter Archive