Datasets

All Projects
All Years
Viewing 1-10 of 59 datasets
  • ATOMIC 2020

    An atlas of everyday commonsense reasoning, organized through 1.33M textual descriptions of inferential knowledge.Mosaic • 2021We present ATOMIC 2020, a commonsense knowledge graph with 1.33M everyday inferential knowledge tuples about entities and events. ATOMIC 2020 represents a large-scale common sense repository of textual descriptions that encode both the social and the physical aspects of common human everyday experiences, collected with the aim of being complementary to commonsense knowledge encoded in current language models. ATOMIC 2020 introduces 23 commonsense relations types. They can be broadly classified into three categorical types: 9 commonsense relations of social-interaction, 7 physical-entity commonsense relations, and 7 event-centered commonsense relations concerning situations surrounding a given event of interest.
  • ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning

    An atlas of everyday commonsense reasoning, organized through 877k textual descriptions of inferential knowledge.Mosaic • 2021We present ATOMIC, an atlas of everyday commonsense reasoning, organized through 877k textual descriptions of inferential knowledge. Compared to existing resources that center around taxonomic knowledge, ATOMIC focuses on inferential knowledge organized as typed if-then relations with variables (e.g., "if X pays Y a compliment, then Y will likely return the compliment").
  • Rainbow: A Commonsense Reasoning Benchmark

    A commonsense reasoning benchmark spanning social and physical common senseMosaic • 2021Rainbow is a universal commonsense reasoning benchmark spanning both social and physical common sense. Rainbow brings together 6 existing commonsense reasoning tasks: aNLI, Cosmos QA, HellaSWAG, Physical IQa, Social IQa, and WinoGrande. Modelers are challenged to develop techniques which capture world knowledge that helps solve this broad suite of tasks.
  • Scruples: Subreddit Corpus Requiring Understanding Principles in Life-like Ethical Situations

    A corpus and benchmark for predicting communities' ethical judgments on real-life anecdotesMosaic • 2021Scruples is a corpus and benchmark for studying descriptive machine ethics, or machines' ability to understand people's ethical judgments. Scruples offers two datasets: the Anecdotes and the Dilemmas. The Anecdotes collect real-life experiences with ethical judgments about them, while the Dilemmas present pairs of simpler actions with crowdsourced judgments on which is less ethical.
  • StrategyQA

    2,780 implicit multi-hop reasoning questionsAI2 Israel, Question Understanding, Aristo • 2021StrategyQA is a question-answering benchmark focusing on open-domain questions where the required reasoning steps are implicit in the question and should be inferred using a strategy. StrategyQA includes 2,780 examples, each consisting of a strategy question, its decomposition, and evidence paragraphs.
  • ProofWriter

    Updated RuleTaker datasets with 500k questions, answers and proofs over rulebases.Aristo • 2020These datasets accompany the paper "ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language". They contain updated RuleTaker-style datasets with 500k questions, answers and proofs over natural-language rulebases, used to show that Transformers can emulate reasoning over rules expressed in language, including proof generation. It includes variants using closed- and open-world semantics. Proofs include intermediate conclusions. Extra annotations provide data to train the iterative ProofWriter model as well as abductive reasoning to make uncertain statements certain.
  • RuleTaker: Transformers as Soft Reasoners over Language

    Datasets used to teach transformers to reasonAristo • 2020Can transformers be trained to reason (or emulate reasoning) over rules expressed in language? In the associated paper and demo we provide evidence that they can. Our models, that we call RuleTakers, are trained on datasets of synthetic rule bases plus derived conclusions, provided here. The resulting models provide the first demonstration that this kind of soft reasoning over language is indeed learnable.
  • ZEST: ZEroShot learning from Task descriptions

    ZEST is a benchmark for zero-shot generalization to unseen NLP tasks, with 25K labeled instances across 1,251 different tasks.AI2 Irvine, Mosaic, AllenNLP • 2020ZEST tests whether NLP systems can perform unseen tasks in a zero-shot way, given a natural language description of the task. It is an instantiation of our proposed framework "learning from task descriptions". The tasks include classification, typed entity extraction and relationship extraction, and each task is paired with 20 different annotated (input, output) examples. ZEST's structure allows us to systematically test whether models can generalize in five different ways.
  • Open PI

    33K state changes over 4,050 sentences from 810 procedural, real-world paragraphsAristo, Mosaic • 2020Open PI is the first dataset for tracking state changes in procedural text from arbitrary domains by using an unrestricted (open) vocabulary. Our solution is a new task formulation in which just the text is provided, from which a set of state changes (entity, attribute, before, after) is generated for each step, where the entity, attribute, and values must all be predicted from an open vocabulary.
  • Real Toxicity Prompts

    A dataset of 100k sentence snippets from the web for researchers to further address the risk of neural toxic degeneration in models.Mosaic • 2020A dataset of 100k sentence snippets from the web for researchers to further address the risk of neural toxic degeneration in models.
All Projects
All Years