An abstract illustration of swirling shapes, meant to denote a futuristic feeling.

Research - Papers

Explore a selection of our published work on a variety of key research challenges in AI.

OLMES: A Standard for Language Model Evaluations

Yuling GuOyvind TafjordBailey KuehlHanna Hajishirzi

2024

arXiv.org

Progress in AI is often demonstrated by new models claiming improved performance on tasks measuring model capabilities. Evaluating language models in particular is challenging, as small changes to…

SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals

Ruihan YangJiangjie ChenYikai ZhangDeqing Yang

2024

technical report

Language agents powered by large language models (LLMs) are increasingly valuable as decision-making tools in domains such as gaming and programming. However, these agents often face challenges in…

Digital Socrates: Evaluating LLMs through explanation critiques

Yuling GuOyvind TafjordPeter Clark

2024

ACL

While LLMs can provide reasoned explanations along with their answers, the nature and quality of those explanations are still poorly understood. In response, our goal is to define a detailed way of…

Universal Visual Decomposer: Long-Horizon Manipulation Made Easy

Zichen ZhangYunshuang LiOsbert BastaniLuca Weihs

2024

IEEE International Conference on Robotics and Automation

Real-world robotic tasks stretch over extended horizons and encompass multiple stages. Learning long-horizon manipulation tasks, however, is a long-standing challenge, and demands decomposing the…

Mitigating Barriers to Public Social Interaction with Meronymous Communication

Nouran SolimanHyeonsu B. KangMatthew LatzkeDavid R. Karger

2024

CHI

In communities with social hierarchies, fear of judgment can discourage communication. While anonymity may alleviate some social pressure, fully anonymous spaces enable toxic behavior and hide the…

PaperWeaver: Enriching Topical Paper Alerts by Contextualizing Recommended Papers with User-collected Papers

Yoonjoo LeeHyeonsu B KangMatt LatzkePao Siangliulue

2024

CHI

With the rapid growth of scholarly archives, researchers subscribe to"paper alert"systems that periodically provide them with recommendations of recently published papers that are similar to…

A Design Space for Intelligent and Interactive Writing Assistants

Mina LeeKaty Ilonka GeroJohn Joon Young ChungPao Siangliulue

2024

CHI

In our era of rapid technological advancement, the research landscape for writing assistants has become increasingly fragmented across various research communities. We seek to address this challenge…

Improving Language Models with Advantage-based Offline Policy Gradients

Ashutosh BahetiXiming LuFaeze BrahmanMark O. Riedl

2024

ICLR

Language Models (LMs) achieve substantial language capabilities when finetuned using Reinforcement Learning with Human Feedback (RLHF). However, RLHF is an unstable and data-hungry process that…

TRAM: Bridging Trust Regions and Sharpness Aware Minimization

Tom SherborneNaomi SaphraPradeep DasigiHao Peng

2024

ICLR

By reducing the curvature of the loss surface in the parameter space, Sharpness-aware minimization (SAM) yields widespread robustness improvement under domain transfer. Instead of focusing on…

The Expressive Power of Transformers with Chain of Thought

William MerrillAshish Sabharwal

2024

ICLR

Recent theoretical work has identified surprisingly simple reasoning problems, such as checking if two nodes in a graph are connected or simulating finite-state machines, that are provably…

Previous82-91Next