Skip to main content ->
Ai2

Ai2 Newsletter

August 2024

Top story - "Brand" new for Ai2

We've been working tirelessly behind the scenes to give Ai2 a fresh new look and feel, and we're delighted to share the fruits of our labor today. Ai2 has a new brand and website, inspired by our aim to make breakthrough AI that solves the world's biggest problems. We've taken elements from our old logo (can you spot the spark?) and fresh ideas from our friends at the brand consultancy Archetype, and we're feeling fresh 💅

AppWorld: a new way to benchmark interactivity

Imagine a world where AI agents can act as your personal assistant, completing tasks for you like setting up a return on Amazon or canceling meetings based on your emails. Introducing AppWorld, a high-fidelity execution environment of 9 day-to-day apps, operable via 457 APIs, populated with digital activities of ~100 people living in a simulated world, and an associated benchmark of natural, diverse, and challenging autonomous agent tasks requiring rich and interactive coding.

Helping AI cite its sources

Large language models (LLMs) learn a vast amount of knowledge during pretraining, but they are often oblivious to the source(s) of such knowledge. Intrinsic source citation can enhance LLM transparency, interpretability, and verifiability. In this paper, our team explores source-aware training, a method that involves training an LLM to associate unique source document identifiers with the knowledge in each document, followed by instruction-tuning to teach the LLM to cite a supporting pretraining source when prompted.

Can LLMs help discover data-driven scientific hypotheses?

Our team presents DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery. DiscoveryBench includes 264 real and 903 synthetic tasks, and the evaluation of state-of-the-art LLM-based reasoning frameworks on this benchmark. We found that even the best-performing system only achieved a maximum score of 25%, highlighting the challenges in autonomous data-driven discovery.

Gundi, a finalist in Fast Company's Innovation by Design Awards

Gundi, a free “universal adaptor” from our EarthRanger team and others, integrates data and technologies so that conservationists can use the tools they need to protect wildlife.

More from us

Ai2 Newsletter Archive