An abstract illustration of swirling shapes, meant to denote a futuristic feeling.

Research - Papers

Explore a selection of our published work on a variety of key research challenges in AI.

AstaBench: Rigorous Benchmarking of AI Agents with a Scientific Research Suite

Jonathan BraggMike D'ArcyNishant BalepurDaniel S. Weld

2025

arXiv

AI agents hold great real-world promise, with the potential to revolutionize scientific productivity by automating literature reviews, replicating experiments, analyzing data, and even proposing new…

Re-evaluating Automatic LLM System Ranking for Alignment with Human Preference

Mingqi GaoYixin LiuXinyu HuArman Cohan

2025

NAACL

Evaluating and ranking the capabilities of different LLMs is crucial for understanding their performance and alignment with human preferences. Due to the high cost and time-consuming nature of human…

Social-RAG: Retrieving from Group Interactions to Socially Ground Proactive AI Generation to Group Preferences

Ruotong WangXinyi ZhouLin QiuAmy X. Zhang

2025

CHI

AI agents are increasingly tasked with making proactive suggestions in online spaces where groups collaborate, but can be unhelpful or even annoying, due to not fitting the group's preferences or…

CodeScientist: End-to-End Semi-Automated Scientific Discovery with Code-based Experimentation

Peter JansenOyvind TafjordMarissa RadenskyPeter Clark

2025

ACL (Findings)

Despite the surge of interest in autonomous scientific discovery (ASD) of software artifacts (e.g., improved ML algorithms), current ASD systems face two key limitations: (1) they largely explore…

Paloma: A Benchmark for Evaluating Language Model Fit

Ian MagnussonAkshita BhagiaValentin HofmannJesse Dodge

2024

NeurIPS

Language models (LMs) commonly report perplexity on monolithic data held out from training. Implicitly or explicitly, this data is composed of domains$\unicode{x2013}$varying distributions of…

IdeaSynth: Iterative Research Idea Development Through Evolving and Composing Idea Facets with Literature-Grounded Feedback

Kevin PuK. FengTovi GrossmanPao Siangliulue

2024

arXiv.org

Research ideation involves broad exploring and deep refining ideas. Both require deep engagement with literature. Existing tools focus primarily on idea broad generation, yet offer little support…

On-the-fly Definition Augmentation of LLMs for Biomedical NER

Monica MunnangiSergey FeldmanByron C WallaceAakanksha Naik

2024

NAACL 2024

Despite their general capabilities, LLMs still struggle on biomedical NER tasks, which are difficult due to the presence of specialized terminology and lack of training data. In this work we set out…

Personalized Jargon Identification for Enhanced Interdisciplinary Communication

Yue GuoJoseph Chee ChangMaria AntoniakTal August

2024

NAACL

Scientific jargon can impede researchers when they read materials from other domains. Current methods of jargon identification mainly use corpus-level familiarity indicators (e.g., Simple Wikipedia…

Let's Get to the Point: LLM-Supported Planning, Drafting, and Revising of Research-Paper Blog Posts

Marissa RadenskyDaniel S. WeldJoseph Chee ChangJonathan Bragg

2024

arXiv

Research-paper blog posts help scientists to disseminate their work to a larger audience, but translating scientific long documents into long-form summaries like blog posts raises unique challenges:…

A Design Space for Intelligent and Interactive Writing Assistants

Mina LeeKaty Ilonka GeroJohn Joon Young ChungPao Siangliulue

2024

CHI

In our era of rapid technological advancement, the research landscape for writing assistants has become increasingly fragmented across various research communities. We seek to address this challenge…

1-10Next