Datasets

All Projects
All Years
Viewing 11-20 of 64 datasets
  • A Dataset of Incomplete Information Reading Comprehension Questions

    13K reading comprehension questions on Wikipedia paragraphs that require following links in those paragraphs to other Wikipedia pagesAllenNLP • 2020IIRC is a crowdsourced dataset consisting of information-seeking questions requiring models to identify and then retrieve necessary information that is missing from the original context. Each original context is a paragraph from English Wikipedia and it comes… more
  • ZEST: ZEroShot learning from Task descriptions

    ZEST is a benchmark for zero-shot generalization to unseen NLP tasks, with 25K labeled instances across 1,251 different tasks.AI2 Irvine, Mosaic, AllenNLP • 2020ZEST tests whether NLP systems can perform unseen tasks in a zero-shot way, given a natural language description of the task. It is an instantiation of our proposed framework "learning from task descriptions". The tasks include classification, typed entity… more
  • Open PI

    33K state changes over 4,050 sentences from 810 procedural, real-world paragraphsAristo, Mosaic • 2020Open PI is the first dataset for tracking state changes in procedural text from arbitrary domains by using an unrestricted (open) vocabulary. Our solution is a new task formulation in which just the text is provided, from which a set of state changes (entity… more
  • Real Toxicity Prompts

    A dataset of 100k sentence snippets from the web for researchers to further address the risk of neural toxic degeneration in models.Mosaic • 2020A dataset of 100k sentence snippets from the web for researchers to further address the risk of neural toxic degeneration in models.
  • eQASC: Multihop Explanations for QASC

    98k annotated explanations for the QASC datasetAristo • 2020This dataset contains 98k 2-hop explanations for questions in the QASC dataset, with annotations indicating if they are valid (~25k) or invalid (~73k) explanations.
  • hasPart KB

    A high-quality KB of hasPart relationsAristo • 2020A high-quality knowledge base of ~50k hasPart relationships, extracted from a large corpus of generic statements.
  • SciDocs

    Academic paper representation dataset accompanying the SPECTER paper/modelSemantic Scholar • 2020Representation learning is a critical ingredient for natural language processing systems. Recent Transformer language models like BERT learn powerful textual representations, but these models are targeted towards token- and sentence-level training objectives… more
  • GenericsKB

    A large knowledge base of generic sentencesAristo • 2020The GenericsKB contains 3.4M+ generic sentences about the world, i.e., sentences expressing general truths such as "Dogs bark," and "Trees remove carbon dioxide from the atmosphere." Generics are potentially useful as a knowledge source for AI systems… more
  • SciFact

    1.4K expert-written scientific claims paired with evidence-containing abstracts.Semantic Scholar • 2020Due to the rapid growth in the scientific literature, there is a need for automated systems to assist researchers and the public in assessing the veracity of scientific claims. To facilitate the development of systems for this task, we introduce SciFact, a… more
  • CORD-19: COVID-19 Open Research Dataset

    Tens of thousands of scholarly articles about COVID-19 and related coronavirusesSemantic Scholar • 2020CORD-19 is a free resource of tens of thousands of scholarly articles about COVID-19, SARS-CoV-2, and related coronaviruses for use by the global research community.
All Projects
All Years