Research - Papers
Explore a selection of our published work on a variety of key research challenges in AI.
MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers
As major progress is made in open-ended text generation, measuring how close machine-generated text is to human language remains a critical open problem. We introduce MAUVE , a comparison measure…
MERLOT: Multimodal Neural Script Knowledge Models
As humans, we understand events in the visual world contextually, performing multimodal reasoning across time to make inferences about the past, present, and future. We introduce MERLOT, a model…
NaturalProofs: Mathematical Theorem Proving in Natural Language
Understanding and creating mathematics using natural mathematical language – the mixture of symbolic and natural language used by humans – is a challenging and important problem for driving progress…
Contrastive Explanations for Model Interpretability
Contrastive explanations clarify why an event occurred in contrast to another. They are more inherently intuitive to humans to both produce and comprehend. We propose a methodology to produce…
Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus
As language models are trained on ever more text, researchers are turning to some of the largest corpora available. Unlike most other types of datasets in NLP, large unlabeled text corpora are often…
Moral Stories: Situated Reasoning about Norms, Intents, Actions, and their Consequences
In social settings, much of human behavior is governed by unspoken rules of conduct. For artificial systems to be fully integrated into social environments, adherence to such norms is a central…
proScript: Partially Ordered Scripts Generation
Scripts standardized event sequences describing typical everyday activities have been shown to help understand narratives by providing expectations, resolving ambiguity, and filling in unstated…
Sister Help: Data Augmentation for Frame-Semantic Role Labeling
While FrameNet is widely regarded as a rich resource of semantics in natural language processing, a major criticism concerns its lack of coverage and the relative paucity of its labeled data…
Surface Form Competition: Why the Highest Probability Answer Isn't Always Right
Large language models have shown promising results in zero-shot settings (Brown et al., 2020; Radford et al., 2019). For example, they can perform multiple choice tasks simply by conditioning on a…
Can Machines Learn Morality? The Delphi Experiment
As AI systems become increasingly powerful and pervasive, there are growing concerns about machines’ morality or a lack thereof. Yet, teaching morality to machines is a formidable task, as morality…