Datasets

Viewing 11-20 of 38 datasets
  • GenericsKB

    A large knowledge base of generic sentencesAristo • 2020The GenericsKB contains 3.4M+ generic sentences about the world, i.e., sentences expressing general truths such as "Dogs bark," and "Trees remove carbon dioxide from the atmosphere." Generics are potentially useful as a knowledge source for AI systems…
  • ARC Direct Answer Questions

    A dataset of 2,985 grade-school level, direct-answer science questions derived from the ARC multiple-choice question set.Aristo • 2020A dataset of 2,985 grade-school level, direct-answer ("open response", "free form") science questions derived from the ARC multiple-choice question set released as part of the AI2 Reasoning Challenge in 2018.
  • Question Answering via Sentence Composition (QASC)

    9,980 8-way multiple-choice questions about grade school scienceAristo • 2019QASC is a question-answering dataset with a focus on sentence composition. It consists of 9,980 8-way multiple-choice questions about grade school science (8,134 train, 926 dev, 920 test), and comes with a corpus of 17M sentences.
  • QuaRTz Dataset

    3864 questions about open domain qualitative relationshipsAristo • 2019QuaRTz is a crowdsourced dataset of 3864 multiple-choice questions about open domain qualitative relationships. Each question is paired with one of 405 different background sentences (sometimes short paragraphs).
  • ARC Question Classification Dataset

    7,787 multiple choice questions annotated with question classification labelsAristo • 2019A dataset of detailed problem domain classification labels for each of the 7,787 multiple-choice science questions found in the AI2 Reasoning Challenge (ARC) dataset, to enable targeted pairing of questions with problem-specific solvers. Also included is a…
  • What-If Question Answering

    Large-scale dataset of 39705 "What if..." questions over procedural textAristo • 2019The WIQA dataset V1 has 39705 questions containing a perturbation and a possible effect in the context of a paragraph. The dataset is split into 29808 train questions, 6894 dev questions and 3003 test questions.
  • QuaRel Dataset

    2771 story questions about qualitative relationshipsAristo • 2018QuaRel is a crowdsourced dataset of 2771 multiple-choice story questions, including their logical forms.
  • OpenBookQA Dataset

    5,957 multiple-choice questions probing a book of 1,326 science factsAristo • 2018OpenBookQA aims to promote research in advanced question-answering, probing a deeper understanding of both the topic (with salient facts summarized as an open book, also provided with the dataset) and the language it is expressed in. In particular, it…
  • ProPara Dataset

    488 richly annotated paragraphs about processes (containing 3,300 sentences)Aristo • 2018The ProPara dataset is designed to train and test comprehension of simple paragraphs describing processes (e.g., photosynthesis), designed for the task of predicting, tracking, and answering questions about how entities change during the process.
  • PeerRead

    Over 14K paper drafts and over 10K textual peer reviewsAristo • 2018PeerRead is a dataset of scientific peer reviews available to help researchers study this important artifact.