Research - Papers
Explore a selection of our published work on a variety of key research challenges in AI.
Probing Factually Grounded Content Transfer with Factual Ablation
Despite recent success, large neural models often generate factually incorrect text. Compounding this is the lack of a standard automatic evaluation for factuality–it cannot be meaningfully improved…
ScienceWorld: Is your Agent Smarter than a 5th Grader?
This paper presents a new benchmark, SCIENCEWORLD, to test agents’ scientific reasoning abilities in a new interactive text environment at the level of a standard elementary school science…
Faking Fake News for Real Fake News Detection: Propaganda-loaded Training Data Generation
While there has been a lot of research and many recent advances in neural fake news detection, defending against human-written disinformation remains underexplored. Upon analyzing current approaches…
Knowledge is Power: Symbolic Knowledge Distillation, Commonsense Morality, & Multimodal Script Knowledge
Scale appears to be the winning recipe in today's AI leaderboards. And yet, extreme-scale neural models are still brittle to make errors that are often nonsensical and even counterintuitive. In this…
Computational Lens on Cognition: Study Of Autobiographical Versus Imagined Stories With Large-Scale Language Models
Lifelong experiences and learned knowledge lead to shared expectations about how common situations tend to unfold. Such knowledge enables people to interpret story narratives and identify salient…
Imagined versus Remembered Stories: Quantifying Differences in Narrative Flow
Lifelong experiences and learned knowledge lead to shared expectations about how common situations tend to unfold. Such knowledge of narrative event flow enables people to weave together a story.…
PROMPT WAYWARDNESS: The Curious Case of Discretized Interpretation of Continuous Prompts
Fine-tuning continuous prompts for target tasks has recently emerged as a compact alternative to full model fine-tuning. Motivated by these promising results, we investigate the feasibility of…
UnifiedQA-v2: Stronger Generalization via Broader Cross-Format Training
We present UNIFIEDQA-v2, a QA model built with the same process as UNIFIEDQA, except that it utilizes more supervision – roughly 3× the number of datasets used for UNIFIEDQA. This generally leads to…
Inherently Explainable Reinforcement Learning in Natural Language
We focus on the task of creating a reinforcement learning agent that is inherently explainable—with the ability to produce immediate local explanations by thinking out loud while performing a task…
CommonsenseQA 2.0: Exposing the Limits of AI through Gamification
Constructing benchmarks that test the abilities of modern natural language un1 derstanding models is difficult – pre-trained language models exploit artifacts in 2 benchmarks to achieve human…