Research - Papers
Explore a selection of our published work on a variety of key research challenges in AI.
Edited Media Understanding Frames: Reasoning about the Intent and Implications of Visual Disinformation
Multimodal disinformation, from `deepfakes' to simple edits that deceive, is an important societal problem. Yet at the same time, the vast majority of media edits are harmless -- such as a filtered…
PAWLS: PDF Annotation With Labels and Structure
Adobe’s Portable Document Format (PDF) is a popular way of distributing view-only documents with a rich visual markup. This presents a challenge to NLP practitioners who wish to use the information…
Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets
With the success of large-scale pre-training and multilingual modeling in Natural Language Processing (NLP), recent years have seen a proliferation of large, web-mined text datasets covering…
Explaining Relationships Between Scientific Documents
We address the task of explaining relationships between two scientific documents using natural language text. This task requires modeling the complex content of long technical documents, deducing a…
ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language
Transformers have been shown to emulate logical deduction over natural language theories (logical rules expressed in natural language), reliably assigning true/false labels to candidate…
Efficient Passage Retrieval with Hashing for Open-domain Question Answering
Most state-of-the-art open-domain question answering systems use a neural retrieval model to encode passages into continuous vectors and extract them from a knowledge source. However, such retrieval…
Prompting Contrastive Explanations for Commonsense Reasoning Tasks
Many commonsense reasoning NLP tasks involve choosing between one or more possible answers to a question or prompt based on knowledge that is often implicit. Large pretrained language models (PLMs)…
fv3gfs-wrapper: a Python wrapper of the FV3GFS atmospheric model
Simulation software in geophysics is traditionally written in Fortran or C++ due to the stringent performance requirements these codes have to satisfy. As a result, researchers who use…
Turning Tables: Generating Examples from Semi-structured Tables for Endowing Language Models with Reasoning Skills
Models pre-trained with a language modeling objective possess ample world knowledge and language skills, but are known to struggle in tasks that require reasoning. In this work, we propose to…
Correcting weather and climate models by machine learning nudged historical simulations
Due to limited resolution and inaccurate physical parameterizations, weather and climate models consistently develop biases compared to the observed atmosphere. Using the FV3GFS model at coarse…