    • Over 39 million published research papers in Computer Science, Neuroscience, and Biomedical

      This is a subset of the full Semantic Scholar corpus which represents papers crawled from the Web and subjected to a number of filters.

    • 630 paper annotations

      This dataset is comprised of annotations for 465 computer science papers. The annotations indicate whether a citation is important (i.e., refers to ongoing or continued work on the relevant topic) or not and then assigns the citation one of four importance rankings. This data set was produced at AI2 as part of intern Marco Valenzuela's work for his paper, "Identifying Meaningful Citations".

    • March 2017

      This is the dataset for the paper Explicit Semantic Ranking for Academic Search via Knowledge Graph Embedding. It includes the query log used in the paper, relevance judgements for the queries, ranking lists from Semantic Scholar, candidate documents, entity embeddings trained using the knowledge graph, and baselines, development methods, and alternative methods from the experiments.