Research - Papers
Explore a selection of our published work on a variety of key research challenges in AI.
Harmonic Mobile Manipulation
Recent advancements in robotics have enabled robots to navigate complex scenes or manipulate diverse objects independently. However, robots are still impotent in many household tasks requiring…
RealTime QA: What's the Answer Right Now?
We introduce R EAL T IME QA, a dynamic question answering (QA) platform that announces questions and evaluates systems on a regular basis (weekly in this version). R E AL T IME QA inquires about the…
A Logic for Expressing Log-Precision Transformers
One way to interpret the reasoning power of transformer-based language models is to describe the types of logical rules they can resolve over some input text. Recently, Chiang et al. (2023) showed…
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
Language models (LMs) often exhibit undesirable text generation behaviors, including generating false, toxic, or irrelevant outputs. Reinforcement learning from human feedback (RLHF) - where human…
How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources
In this work we explore recent advances in instruction-tuning language models on a range of open instruction-following datasets. Despite recent claims that open models can be on par with…
SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks
We introduce SwiftSage, a novel agent framework inspired by the dual-process theory of human cognition, designed to excel in action planning for complex interactive reasoning tasks. SwiftSage…
Faith and Fate: Limits of Transformers on Compositionality
Transformer large language models (LLMs) have sparked admiration for their exceptional performance on tasks that demand intricate multi-step reasoning. Yet, these models simultaneously show failures…
SugarCrepe: Fixing Hackable Benchmarks for Vision-Language Compositionality
In the last year alone, a surge of new benchmarks to measure compositional understanding of vision-language models have permeated the machine learning ecosystem. Given an image, these benchmarks…
SciRepEval: A Multi-Format Benchmark for Scientific Document Representations
Learned representations of scientific documents can serve as valuable input features for downstream tasks without further fine-tuning. However, existing benchmarks for evaluating these…
SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization
We present SODA : the first publicly available, million-scale high-quality social dialogue dataset. Using SODA , we train COSMO : a generalizable conversation agent outperforming previous…