An abstract illustration of swirling shapes, meant to denote a futuristic feeling.

Research - Papers

Explore a selection of our published work on a variety of key research challenges in AI.

Leveraging In-Context Learning for Language Model Agents

Shivanshu GuptaSameer SinghAshish SabharwalBen Bogin

2025

NeurIPS • Workshop on Multi-Turn Interactions in LLMs

In-context learning (ICL) with dynamically selected demonstrations combines the flexibility of prompting large language models (LLMs) with the ability to leverage training data to improve…

A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers

William MerrillAshish Sabharwal

2025

NeurIPS

Recent theoretical results show transformers cannot express sequential reasoning problems over long inputs, intuitively because their computational *depth* is bounded. However, prior work treats the…

Exact Expressive Power of Transformers with Padding

William MerrillAshish Sabharwal

2025

NeurIPS

Chain of thought is a natural inference-time method for increasing the computational power of transformer-based large language models (LLMs), but comes at the cost of sequential decoding. Are there…

Language Modeling by Language Models

Junyan ChengPeter ClarkKyle Richardson

2025

NeurIPS

Can we leverage LLMs to model the process of discovering novel language model (LM) architectures? Inspired by real research, we propose a multi-agent LLM approach that simulates the conventional…

Open-ended Scientific Discovery via Bayesian Surprise

Dhruv AgarwalBodhisattwa Prasad MajumderReece AdamsonPeter Clark

2025

NeurIPS

The promise of autonomous scientific discovery (ASD) hinges not only on answering questions, but also on knowing which questions to ask. Most recent works in ASD explore the use of large language…

SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks

Yilun ZhaoKaiyan ZhangTiansheng HuArman Cohan

2025

NeurIPS

We present SciArena, an open and collaborative platform for evaluating foundation models on scientific literature tasks. Unlike traditional benchmarks for scientific literature understanding and…

SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature

David WaddenKejian ShiJacob Daniel MorrisonArman Cohan

2025

EMNLP

We present SciRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following instances for training and evaluation, covering 54 tasks. These tasks span…

Intent-Aware Schema Generation And Refinement For Literature Review Tables

Vishakh PadmakumarJoseph Chee ChangKyle LoAakanksha Naik

2025

EMNLP

The increasing volume of academic literature makes it essential for researchers to organize, compare, and contrast collections of documents. Large language models (LLMs) can support this process by…

Text or Pixels? It Takes Half: On the Token Efficiency of Visual Text Inputs in Multimodal LLMs

Yanhong LiZixuan LanJiawei Zhou

2025

EMNLP

Large language models (LLMs) and their multimodal variants can now process visual inputs, including images of text. This raises an intriguing question: can we compress textual inputs by feeding them…

MoNaCo: More Natural and Complex Questions for Reasoning Across Dozens of Documents

Tomer WolfsonHarsh TrivediMor GevaReut Tsarfaty

2025

TACL

Automated agents, powered by Large language models (LLMs), are emerging as the go-to tool for querying information. However, evaluation benchmarks for LLM agents rarely feature natural questions…

1-10Next