Research - Papers
Explore a selection of our published work on a variety of key research challenges in AI.
Unsupervised Distillation of Syntactic Information from Contextualized Word Representations
Contextualized word representations, such as ELMo and BERT, were shown to perform well on various semantic and syntactic task. In this work, we tackle the task of unsupervised disentanglement…
PySBD: Pragmatic Sentence Boundary Disambiguation
In this paper, we present a rule-based sentence boundary disambiguation Python package that works out-of-the-box for 22 languages. We aim to provide a realistic segmenter which can provide logical…
Document-Level Definition Detection in Scholarly Documents: Existing Models, Error Analyses, and Future Directions
The task of definition detection is important for scholarly papers, because papers often make use of technical terminology that may be unfamiliar to readers. Despite prior work on definition…
The Extraordinary Failure of Complement Coercion Crowdsourcing
Crowdsourcing has eased and scaled up the collection of linguistic annotation in recent years. In this work, we follow known methodologies of collecting labeled data for the complement coercion…
A Simple Yet Strong Pipeline for HotpotQA
State-of-the-art models for multi-hop question answering typically augment large-scale language models like BERT with additional, intuitively useful capabilities such as named entity recognition,…
UnifiedQA: Crossing Format Boundaries With a Single QA System
Question answering (QA) tasks have been posed using a variety of formats, such as extractive span selection, multiple choice, etc. This has led to format-specialized models, and even to an implicit…
Fact or Fiction: Verifying Scientific Claims
We introduce the task of scientific fact-checking. Given a corpus of scientific articles and a claim about a scientific finding, a fact-checking model must identify abstracts that support or refute…
TLDR: Extreme Summarization of Scientific Documents
We introduce TLDR generation for scientific papers, a new automatic summarization task with high source compression, requiring expert background knowledge and complex language understanding. To…
SciSight: Combining faceted navigation and research group detection for COVID-19 exploratory scientific search
The COVID-19 pandemic has sparked unprecedented mobilization of scientists, already generating thousands of new papers that join a litany of previous biomedical work in related areas. This deluge of…
"You are grounded!": Latent Name Artifacts in Pre-trained Language Models
Pre-trained language models (LMs) may perpetuate biases originating in their training corpus to downstream models. We focus on artifacts associated with the representation of given names (e.g.,…