Research - Papers
Explore a selection of our published work on a variety of key research challenges in AI.
PaperMage: A Unified Toolkit for Processing, Representing, and Manipulating Visually-Rich Scientific Documents
Despite growing interest in applying natural language processing (NLP) and computer vision (CV) models to the scholarly domain, scientific documents remain challenging to work with. They’re often in…
The Semantic Scholar Open Data Platform
The volume of scientific output is creating an urgent need for automated tools to help scientists keep up with developments in their field. Semantic Scholar (S2) is an open data platform and website…
FeedLens: Polymorphic Lenses for Personalizing Exploratory Search over Knowledge Graphs
The vast scale and open-ended nature of knowledge graphs (KGs) make exploratory search over them cognitively demanding for users. We introduce a new technique, polymorphic lenses , that improves…
S2AMP: A High-Coverage Dataset of Scholarly Mentorship Inferred from Publications
Mentorship is a critical component of academia, but is not as visible as publications, citations, grants, and awards. Despite the importance of studying the quality and impact of mentorship, there…
S2AND: A Benchmark and Evaluation System for Author Name Disambiguation
Author Name Disambiguation (AND) is the task of resolving which author mentions in a bibliographic database refer to the same real-world person, and is a critical ingredient of digital library…
S2ORC: The Semantic Scholar Open Research Corpus
We introduce S2ORC, a large contextual citation graph of English-language academic papers from multiple scientific domains; the corpus consists of 81.1M papers, 380.5M citation edges, and associated…
Construction of the Literature Graph in Semantic Scholar
We describe a deployed scalable system for organizing published scientific literature into a heterogeneous graph to facilitate algorithmic manipulation and discovery. The resulting literature graph…
Extracting Scientific Figures with Distantly Supervised Neural Networks
Non-textual components such as charts, diagrams and tables provide key information in many scientific documents, but the lack of large labeled datasets has impeded the development of data-driven…
Explicit Semantic Ranking for Academic Search via Knowledge Graph Embedding
This paper introduces Explicit Semantic Ranking (ESR), a new ranking technique that leverages knowledge graph embedding. Analysis of the query log from our academic search engine,…
PDFFigures 2.0: Mining Figures from Research Papers
Figures and tables are key sources of information in many scholarly documents. However, current academic search engines do not make use of figures and tables when semantically parsing documents or…