Datasets

Viewing 1-4 of 4 datasets
  • CORD-19: COVID-19 Open Research Dataset

    44,000 scholarly articles, including over 29,000 with full text, about COVID-19 and the coronavirus family of virusesSemantic Scholar • 2020CORD-19 is a free resource of over 44,000 scholarly articles, including over 29,000 with full text, about COVID-19 and the coronavirus family of viruses for use by the global research community.
  • Open Research Corpus

    Over 39 million published research papers in Computer Science, Neuroscience, and BiomedicalSemantic Scholar • 2018Over 39 million published research papers in Computer Science, Neuroscience, and Biomedical. This is a subset of the full Semantic Scholar corpus which represents papers crawled from the Web and subjected to a number of filters.
  • Explicit Semantic Ranking Dataset

    March 2017Semantic Scholar • 2017This is the dataset for the paper Explicit Semantic Ranking for Academic Search via Knowledge Graph Embedding. It includes the query log used in the paper, relevance judgements for the queries, ranking lists from Semantic Scholar, candidate documents, entity embeddings trained using the knowledge graph, and baselines, development methods, and alternative methods from the experiments.
  • AI2 Meaningful Citations Data Set

    630 paper annotationsSemantic Scholar • 2014This dataset is comprised of annotations for 465 computer science papers. The annotations indicate whether a citation is important (i.e., refers to ongoing or continued work on the relevant topic) or not and then assigns the citation one of four importance rankings.