Research - Papers
Explore a selection of our published work on a variety of key research challenges in AI.
PySBD: Pragmatic Sentence Boundary Disambiguation
In this paper, we present a rule-based sentence boundary disambiguation Python package that works out-of-the-box for 22 languages. We aim to provide a realistic segmenter which can provide logical…
Fact or Fiction: Verifying Scientific Claims
We introduce the task of scientific fact-checking. Given a corpus of scientific articles and a claim about a scientific finding, a fact-checking model must identify abstracts that support or refute…
MedICaT: A Dataset of Medical Images, Captions, and Textual References
Understanding the relationship between figures and text is key to scientific document understanding. Medical figures in particular are quite complex, often consisting of several subfigures (75% of…
SciSight: Combining faceted navigation and research group detection for COVID-19 exploratory scientific search
The COVID-19 pandemic has sparked unprecedented mobilization of scientists, already generating thousands of new papers that join a litany of previous biomedical work in related areas. This deluge of…
SLEDGE-Z: A Zero-Shot Baseline for COVID-19 Literature Search
With worldwide concerns surrounding the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), there is a rapidly growing body of literature on the virus. Clinicians, researchers, and…
TLDR: Extreme Summarization of Scientific Documents
We introduce TLDR generation for scientific papers, a new automatic summarization task with high source compression, requiring expert background knowledge and complex language understanding. To…
ABNIRML: Analyzing the Behavior of Neural IR Models
Numerous studies have demonstrated the effectiveness of pretrained contextualized language models such as BERT and T5 for ad-hoc search. However, it is not wellunderstood why these methods are so…
Generative Data Augmentation for Commonsense Reasoning
Recent advances in commonsense reasoning depend on large-scale human-annotated training data to achieve peak performance. However, manual curation of training examples is expensive and has been…
Modelling kidney disease using ontology: insights from the Kidney Precision Medicine Project
An important need exists to better understand and stratify kidney disease according to its underlying pathophysiology in order to develop more precise and effective therapeutic agents. National…
High-Precision Extraction of Emerging Concepts from Scientific Literature
Identification of new concepts in scientific literature can help power faceted search, scientific trend analysis, knowledge-base construction, and more, but current methods are lacking. Manual…