Datasets

Viewing 1-9 of 9 datasets
  • Qasper

    Question Answering on Research PapersAllenNLP, Semantic Scholar • 2021A dataset containing 1585 papers with 5049 information-seeking questions asked by regular readers of NLP papers, and answered by a separate set of NLP practitioners.
  • A Dataset of Incomplete Information Reading Comprehension Questions

    13K reading comprehension questions on Wikipedia paragraphs that require following links in those paragraphs to other Wikipedia pagesAllenNLP • 2020IIRC is a crowdsourced dataset consisting of information-seeking questions requiring models to identify and then retrieve necessary information that is missing from the original context. Each original context is a paragraph from English Wikipedia and it comes…
  • ZEST: ZEroShot learning from Task descriptions

    ZEST is a benchmark for zero-shot generalization to unseen NLP tasks, with 25K labeled instances across 1,251 different tasks.Mosaic, AllenNLP • 2020ZEST tests whether NLP systems can perform unseen tasks in a zero-shot way, given a natural language description of the task. It is an instantiation of our proposed framework "learning from task descriptions". The tasks include classification, typed entity…
  • MOCHA

    A benchmark for training and evaluating generative reading comprehension metrics.AllenNLP • 2020Posing reading comprehension as a generation problem provides a great deal of flexibility, allowing for open-ended questions with few restrictions on possible answers. However, progress is impeded by existing generation metrics, which rely on token overlap…
  • TORQUE

    A new English reading comprehension benchmark built on 3.2k news snippets with 21k human-generated questions querying temporal relationships.AllenNLP • 2020A critical part of reading is being able to understand the temporal relationships between events described in a passage of text, even when those relationships are not explicitly stated. However, current machine reading comprehension benchmarks have…
  • Contrast Sets

    Contrast sets provide a local view of a model's decision boundary, which can be used to more accurately evaluate a model's true linguistic capabilities.AllenNLP • 2020Standard test sets for supervised learning evaluate in-distribution generalization. Unfortunately, when a dataset has systematic gaps (e.g., annotation artifacts), these evaluations are misleading: a model can learn simple decision rules that perform well on…
  • Quoref

    24K Question/Answer (QA) pairs over 4.7K paragraphs, split between train (19K QAs), development (2.4K QAs) and a hidden test partition (2.5K QAs).AllenNLP • 2019Quoref is a QA dataset which tests the coreferential reasoning capability of reading comprehension systems. In this span-selection benchmark containing 24K questions over 4.7K paragraphs from Wikipedia, a system must resolve hard coreferences before selecting…
  • Reasoning Over Paragraph Effects in Situations (ROPES)

    14k QA pairs over 1.7K paragraphs, split between train (10k QAs), development (1.6k QAs) and a hidden test partition (1.7k QAs).AllenNLP • 2019ROPES is a QA dataset which tests a system's ability to apply knowledge from a passage of text to a new situation. A system is presented a background passage containing a causal or qualitative relation(s), a novel situation that uses this background, and…
  • Discrete Reasoning Over the content of Paragraphs (DROP)

    The DROP dataset contains 96k Question and Answering pairs (QAs) over 6.7K paragraphs, split between train (77k QAs), development (9.5k QAs) and a hidden test partition (9.5k QAs).AllenNLP • 2019A lot of diverse reading comprehension datasets have recently been introduced to study various phenomena in natural language, ranging from simple paraphrase matching and entity typing to entity tracking and understanding the implications of the context. Given…