Research - Papers
Explore a selection of our published work on a variety of key research challenges in AI.
FloeNet: A mass-conserving global sea ice emulator that generalizes across climates
We introduce FloeNet, a machine-learning emulator trained on the Geophysical Fluid Dynamics Laboratory global sea ice model, SIS2. FloeNet is a mass-conserving model, emulating 6-hour mass and area…
Examining Fast Radiative Feedbacks Using Machine-Learning Weather Emulators
The response of the climate system to increased greenhouse gases and other radiative perturbations is governed by a combination of fast and slow feedbacks. Slow feedbacks are typically activated in…
HiRO-ACE: Fast and skillful AI emulation and downscaling trained on a 3 km global storm-resolving model
Kilometer-scale simulations of the atmosphere are an important tool for assessing local weather extremes and climate impacts, but computational expense limits their use to small regions, short…
Leveraging In-Context Learning for Language Model Agents
In-context learning (ICL) with dynamically selected demonstrations combines the flexibility of prompting large language models (LLMs) with the ability to leverage training data to improve…
A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers
Recent theoretical results show transformers cannot express sequential reasoning problems over long inputs, intuitively because their computational *depth* is bounded. However, prior work treats the…
Exact Expressive Power of Transformers with Padding
Chain of thought is a natural inference-time method for increasing the computational power of transformer-based large language models (LLMs), but comes at the cost of sequential decoding. Are there…
Language Modeling by Language Models
Can we leverage LLMs to model the process of discovering novel language model (LM) architectures? Inspired by real research, we propose a multi-agent LLM approach that simulates the conventional…
Open-ended Scientific Discovery via Bayesian Surprise
The promise of autonomous scientific discovery (ASD) hinges not only on answering questions, but also on knowing which questions to ask. Most recent works in ASD explore the use of large language…
SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks
We present SciArena, an open and collaborative platform for evaluating foundation models on scientific literature tasks. Unlike traditional benchmarks for scientific literature understanding and…
SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature
We present SciRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following instances for training and evaluation, covering 54 tasks. These tasks span…