Datasets

Viewing 31-40 of 65 datasets
  • CommonsenseQA

    12,102 multiple-choice questions with one correct answer and four distractor answersAI2 Israel, Question Understanding • 2019CommonsenseQA is a new multiple-choice question answering dataset that requires different types of commonsense knowledge to predict the correct answers. It contains 12,102 questions with one correct answer and four distractor answers.
  • Discrete Reasoning Over the content of Paragraphs (DROP)

    The DROP dataset contains 96k Question and Answering pairs (QAs) over 6.7K paragraphs, split between train (77k QAs), development (9.5k QAs) and a hidden test partition (9.5k QAs).AllenNLP, AI2 Irvine • 2019DROP is a QA dataset that tests the comprehensive understanding of paragraphs. In this crowdsourced, adversarially-created, 96k question-answering benchmark, a system must resolve multiple references in a question, map them onto a paragraph, and perform…
  • HellaSwag

    HellaSWAG is a dataset for studying grounded commonsense inference.Mosaic • 2019HellaSWAG is a dataset for studying grounded commonsense inference. It consists of 70k multiple choice questions about grounded situations: each question comes from one of two domains -- activitynet or wikihow -- with four answer choices about what might…
  • SciCite: Citation intenent classification dataset

    A large dataset of citation intent classification based on citation textSemantic Scholar • 2019Citations play a unique role in scientific discourse and are crucial for understanding and analyzing scientific work. However not all citations are equal. Some citations refer to use of a method from another work, some discuss results or findings of other…
  • QuaRel Dataset

    2771 story questions about qualitative relationshipsAristo • 2018QuaRel is a crowdsourced dataset of 2771 multiple-choice story questions, including their logical forms.
  • OpenBookQA Dataset

    5,957 multiple-choice questions probing a book of 1,326 science factsAristo • 2018OpenBookQA aims to promote research in advanced question-answering, probing a deeper understanding of both the topic (with salient facts summarized as an open book, also provided with the dataset) and the language it is expressed in. In particular, it…
  • Open Research Corpus

    Over 39 million published research papers in Computer Science, Neuroscience, and BiomedicalSemantic Scholar • 2018Over 39 million published research papers in Computer Science, Neuroscience, and Biomedical. This is a subset of the full Semantic Scholar corpus which represents papers crawled from the Web and subjected to a number of filters.
  • ProPara Dataset

    488 richly annotated paragraphs about processes (containing 3,300 sentences)Aristo • 2018The ProPara dataset is designed to train and test comprehension of simple paragraphs describing processes (e.g., photosynthesis), designed for the task of predicting, tracking, and answering questions about how entities change during the process.
  • PeerRead

    Over 14K paper drafts and over 10K textual peer reviewsAristo • 2018PeerRead is a dataset of scientific peer reviews available to help researchers study this important artifact.
  • ComplexWebQuestions

    34,689 complex questions and their answers, web snippets, and SPARQL queryAI2 Israel, Question Understanding • 2018ComplexWebQuestions is a dataset for answering complex questions that require reasoning over multiple web snippets. It contains a large set of complex questions in natural language, and can be used in multiple ways: 1) By interacting with a search engine…