Skip to main content ->
Ai2

Research - Papers

Explore a selection of our published work on a variety of key research challenges in AI.

Filter papers

PaperMage: A Unified Toolkit for Processing, Representing, and Manipulating Visually-Rich Scientific Documents

Kyle LoZejiang ShenBenjamin NewmanLuca Soldaini
2023
EMNLP

Despite growing interest in applying natural language processing (NLP) and computer vision (CV) models to the scholarly domain, scientific documents remain challenging to work with. They’re often in… 

The Semantic Scholar Open Data Platform

Rodney Michael KinneyChloe AnastasiadesRussell AuthurDaniel S. Weld
2023
arXiv

The volume of scientific output is creating an urgent need for automated tools to help scientists keep up with developments in their field. Semantic Scholar (S2) is an open data platform and website… 

FeedLens: Polymorphic Lenses for Personalizing Exploratory Search over Knowledge Graphs

Harmanpreet KaurDoug DowneyAmanpreet SinghJonathan Bragg
2022
UIST

The vast scale and open-ended nature of knowledge graphs (KGs) make exploratory search over them cognitively demanding for users. We introduce a new technique, polymorphic lenses , that improves… 

S2AMP: A High-Coverage Dataset of Scholarly Mentorship Inferred from Publications

Shaurya RohatgiDoug DowneyDaniel KingSergey Feldman
2022
JCDL

Mentorship is a critical component of academia, but is not as visible as publications, citations, grants, and awards. Despite the importance of studying the quality and impact of mentorship, there… 

S2AND: A Benchmark and Evaluation System for Author Name Disambiguation

Shivashankar SubramanianDaniel KingDoug DowneySergey Feldman
2021
JCDL

Author Name Disambiguation (AND) is the task of resolving which author mentions in a bibliographic database refer to the same real-world person, and is a critical ingredient of digital library… 

S2ORC: The Semantic Scholar Open Research Corpus

Kyle LoLucy Lu WangMark E NeumannDaniel S. Weld
2020
ACL

We introduce S2ORC, a large contextual citation graph of English-language academic papers from multiple scientific domains; the corpus consists of 81.1M papers, 380.5M citation edges, and associated… 

Construction of the Literature Graph in Semantic Scholar

Waleed AmmarDirk GroeneveldChandra Bhagavatulaet al.
2018
NAACL-HLT

We describe a deployed scalable system for organizing published scientific literature into a heterogeneous graph to facilitate algorithmic manipulation and discovery. The resulting literature graph… 

Extracting Scientific Figures with Distantly Supervised Neural Networks

Noah SiegelNicholas LourieRussell Power and Waleed Ammar
2018
JCDL

Non-textual components such as charts, diagrams and tables provide key information in many scientific documents, but the lack of large labeled datasets has impeded the development of data-driven… 

Explicit Semantic Ranking for Academic Search via Knowledge Graph Embedding

Chenyan XiongRussell Power and Jamie Callan
2017
WWW

This paper introduces Explicit Semantic Ranking (ESR), a new ranking technique that leverages knowledge graph embedding. Analysis of the query log from our academic search engine,… 

PDFFigures 2.0: Mining Figures from Research Papers

Christopher Clark and Santosh Divvala
2016
JCDL

Figures and tables are key sources of information in many scholarly documents. However, current academic search engines do not make use of figures and tables when semantically parsing documents or…