Research - Papers
Explore a selection of our published work on a variety of key research challenges in AI.
Ontology-Aware Clinical Abstractive Summarization
Automatically generating accurate summaries from clinical reports could save a clinician's time, improve summary coverage, and reduce errors. We propose a sequence-to-sequence abstractive…
Quantifying Sex Bias in Clinical Studies at Scale With Automated Data Extraction
Importance: Analyses of female representation in clinical studies have been limited in scope and scale. Objective: To perform a large-scale analysis of global enrollment sex bias in clinical…
Combining Distant and Direct Supervision for Neural Relation Extraction
In relation extraction with distant supervision, noisy labels make it difficult to train quality models. Previous neural models addressed this problem using an attention mechanism that attends to…
Structural Scaffolds for Citation Intent Classification in Scientific Publications
Identifying the intent of a citation in scientific papers (e.g., background information, use of methods, comparing results) is critical for machine reading of individual publications and automated…
Citation Count Analysis for Papers with Preprints
We explore the degree to which papers prepublished on arXiv garner more citations, in an attempt to paint a sharper picture of fairness issues related to prepublishing. A paper’s citation count is…
Construction of the Literature Graph in Semantic Scholar
We describe a deployed scalable system for organizing published scientific literature into a heterogeneous graph to facilitate algorithmic manipulation and discovery. The resulting literature graph…
A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP Applications
Peer reviewing is a central component in the scientific publishing process. We present the first public dataset of scientific peer reviews available for research pur- poses (PeerRead v1), providing…
Content-Based Citation Recommendation
We present a content-based method for recommending citations in an academic paper draft. We embed a given query document into a vector space, then use its nearest neighbors as candidates, and rerank…
Extracting Scientific Figures with Distantly Supervised Neural Networks
Non-textual components such as charts, diagrams and tables provide key information in many scientific documents, but the lack of large labeled datasets has impeded the development of data-driven…
Ontology Alignment in the Biomedical Domain Using Entity Definitions and Context
Ontology alignment is the task of identifying semantically equivalent entities from two given ontologies. Different ontologies have different representations of the same entity, resulting in a need…