Research - Papers
Explore a selection of our published work on a variety of key research challenges in AI.
Transparent Human Evaluation for Image Captioning
We establish a rubric-based human evaluation protocol for image captioning models. Our scoring rubrics and their definitions are carefully developed based on machineand humangenerated captions on…
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet…
MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound
This task enables it to perform well variety Abstract As humans, we navigate a multimodal world, building a holistic understanding from all our senses. We introduce MERLOT Reserve , a model that…
Quark: Controllable Text Generation with Reinforced Unlearning
Large-scale language models often learn behaviors that are misaligned with user expectations. Generated text may contain offensive or toxic language, contain significant repetition, or be of a…
Investigating the Benefits of Free-Form Rationales
Free-form rationales aim to aid model interpretability by supplying the background knowledge that can help understand model decisions. Crowdsourced rationales are provided for commonsense QA…
Multimodal Knowledge Alignment with Reinforcement Learning
Large language models readily adapt to novel settings, even without task-specific training data. Can their zero-shot capacity be extended to multimodal inputs? In this work, we propose ESPER which…
NaturalProver: Grounded Mathematical Proof Generation with Language Models
Theorem proving in natural mathematical language - the mixture of symbolic and natural language used by humans - plays a central role in mathematical advances and education, and tests aspects of…
ProsocialDialog: A Prosocial Backbone for Conversational Agents
Most existing dialogue systems fail to respond properly to potentially unsafe user utterances by either ignoring or passively agreeing with them. To address this issue, we introduce ProsocialDialog,…
Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations
Despite their impressive capabilities, large pretrained language models (LMs) struggle with consistent reasoning; recently, prompting LMs to generate explanations that self-guide the inference has…
Penguins Don't Fly: Reasoning about Generics through Instantiations and Exceptions
Generics express generalizations about the world (e.g., “birds can fly"). However, they are not universally true – while sparrows and penguins are both birds, only sparrows can fly and penguins…