Ai2 Newsletter

February 2025

A table mapping Tülu 3 405B performance compared to other current models across several evaluation benchmarks.

Top story - Tülu 3 405B, the first application of fully open post-training recipes to the largest open-weight models

Our latest member of the Tülu 3 family demonstrates that our recipe, which includes Reinforcement Learning from Verifiable Rewards (RLVR) scales to 405B - with performance on par with GPT-4o, and surpassing prior open-weight post-trained models of the same size including Llama 3.1. We found that the RLVR framework improved the MATH performance more significantly at a larger scale, i.e. 405B compared to 70B and 8B, similar to the findings in the DeepSeek-R1 report. Read the blog for more details.

Try it on Ai2 Playground

Ai2 ScholarQA helps with literature review

Can AI really help with literature reviews? Meet Ai2 ScholarQA, an experimental solution that allows you to ask questions that require multiple scientific papers to answer. It gives more in-depth, detailed, and contextual answers with table comparisons, expandable sections for subtopics, and citations with paper excerpts for verification. Read the blog for more details.

(P.S. It's 100% free 😉)

Try ScholarQA

Panama's biggest illegal fishing arrest via Skylight

Using Skylight's high-resolution imagery as well as other tools, Panama detected 16 vessels fully loaded with thousands of pounds of illegally caught yellowfin tuna in Coiba Ridge MPA, a vital part of the Eastern Tropical Pacific Marine Corridor. We're thrilled that our AI and data created real impact for conservation!

Learn more about Skylight

Tabletop Red-Teaming for AI Safety with DSRI

Partnering with the Digital Safety Research Institute (DSRI), we organized tabletop red-teams to assess the opportunities and risks involved in the release of the Open Language Model (OLMo) family. Learn about the results and how we approach AI safety in the joint blog post.

Read the post

More from us

Ai2 Newsletter Archive