Research - Papers
Explore a selection of our published work on a variety of key research challenges in AI.
How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources
In this work we explore recent advances in instruction-tuning language models on a range of open instruction-following datasets. Despite recent claims that open models can be on par with…
SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks
We introduce SwiftSage, a novel agent framework inspired by the dual-process theory of human cognition, designed to excel in action planning for complex interactive reasoning tasks. SwiftSage…
Faith and Fate: Limits of Transformers on Compositionality
Transformer large language models (LLMs) have sparked admiration for their exceptional performance on tasks that demand intricate multi-step reasoning. Yet, these models simultaneously show failures…
SciRepEval: A Multi-Format Benchmark for Scientific Document Representations
Learned representations of scientific documents can serve as valuable input features for downstream tasks without further fine-tuning. However, existing benchmarks for evaluating these…
SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization
We present SODA : the first publicly available, million-scale high-quality social dialogue dataset. Using SODA , we train COSMO : a generalizable conversation agent outperforming previous…
We're Afraid Language Models Aren't Modeling Ambiguity
Ambiguity is an intrinsic feature of natural language. Managing ambiguity is a key part of human language understanding, allowing us to anticipate misunderstanding as communicators and revise our…
Vera: A General-Purpose Plausibility Estimation Model for Commonsense Statements
Despite the much discussed capabilities of today's language models, they are still prone to silly and unexpected commonsense failures. We consider a retrospective verification approach that reflects…
Language Models with Rationality
While large language models (LLMs) are proficient at question-answering (QA), the dependencies between their answers and other "beliefs" they may have about the world are typically unstated, and may…
Increasing Probability Mass on Answer Choices Does Not Always Improve Accuracy
When pretrained language models (LMs) are applied to discriminative tasks such as multiple-choice questions, they place probability mass on vocabulary tokens that aren't among the given answer…
Editing Common Sense in Transformers
Editing model parameters directly in Transformers makes updating open-source transformer-based models possible without re-training. However, these editing methods have only been evaluated on…