Datasets

Viewing 21-30 of 68 datasets
  • SciFact

    1.4K expert-written scientific claims paired with evidence-containing abstracts.Semantic Scholar • 2020Due to the rapid growth in the scientific literature, there is a need for automated systems to assist researchers and the public in assessing the veracity of scientific claims. To facilitate the development of systems for this task, we introduce SciFact, a…
  • TORQUE

    A new English reading comprehension benchmark built on 3.2k news snippets with 21k human-generated questions querying temporal relationships.AllenNLP • 2020A critical part of reading is being able to understand the temporal relationships between events described in a passage of text, even when those relationships are not explicitly stated. However, current machine reading comprehension benchmarks have…
  • Contrast Sets

    Contrast sets provide a local view of a model's decision boundary, which can be used to more accurately evaluate a model's true linguistic capabilities.AllenNLP • 2020Standard test sets for supervised learning evaluate in-distribution generalization. Unfortunately, when a dataset has systematic gaps (e.g., annotation artifacts), these evaluations are misleading: a model can learn simple decision rules that perform well on…
  • CORD-19: COVID-19 Open Research Dataset

    Tens of thousands of scholarly articles about COVID-19 and related coronavirusesSemantic Scholar • 2020CORD-19 is a free resource of tens of thousands of scholarly articles about COVID-19, SARS-CoV-2, and related coronaviruses for use by the global research community.
  • Break

    83,978 examples sampled from 10 question answering datasets over text, images and databases.AI2 Israel, Question Understanding • 2020Break is a human annotated dataset of natural language questions and their Question Decomposition Meaning Representations (QDMRs). Break consists of 83,978 examples sampled from 10 question answering datasets over text, images and databases.
  • ARC Direct Answer Questions

    A dataset of 2,985 grade-school level, direct-answer science questions derived from the ARC multiple-choice question set.Aristo • 2020A dataset of 2,985 grade-school level, direct-answer ("open response", "free form") science questions derived from the ARC multiple-choice question set released as part of the AI2 Reasoning Challenge in 2018.
  • S2ORC: The Semantic Scholar Open Research Corpus

    The largest collection of machine-readable academic papers to date for NLP & text mining.Semantic Scholar • 2019A large corpus of 81.1M English-language academic papers spanning many academic disciplines. Rich metadata, paper abstracts, resolved bibliographic references, as well as structured full text for 8.1M open access papers. Full text annotated with…
  • Quoref

    24K Question/Answer (QA) pairs over 4.7K paragraphs, split between train (19K QAs), development (2.4K QAs) and a hidden test partition (2.5K QAs).AllenNLP, AI2 Irvine • 2019Quoref is a QA dataset which tests the coreferential reasoning capability of reading comprehension systems. In this span-selection benchmark containing 24K questions over 4.7K paragraphs from Wikipedia, a system must resolve hard coreferences before selecting…
  • Reasoning Over Paragraph Effects in Situations (ROPES)

    14k QA pairs over 1.7K paragraphs, split between train (10k QAs), development (1.6k QAs) and a hidden test partition (1.7k QAs).AllenNLP, AI2 Irvine • 2019ROPES is a QA dataset which tests a system's ability to apply knowledge from a passage of text to a new situation. A system is presented a background passage containing a causal or qualitative relation(s), a novel situation that uses this background, and…
  • Question Answering via Sentence Composition (QASC)

    9,980 8-way multiple-choice questions about grade school scienceAristo • 2019QASC is a question-answering dataset with a focus on sentence composition. It consists of 9,980 8-way multiple-choice questions about grade school science (8,134 train, 926 dev, 920 test), and comes with a corpus of 17M sentences.