Viewing 1-10 of 62 datasets
- Question Answering on Research Papers2021A dataset containing 1585 papers with 5049 information-seeking questions asked by regular readers of NLP papers, and answered by a separate set of NLP practitioners.
- 4998 facts and 12147 constraints to test a model's consistencyAristo • 2021Dataset of 4998 simple facts and 12147 constraints to test, and improve, a model's accuracy and consistency
- 2k multi-step entailment trees, explaining the answers to ARC science questionsAristo • 20212k multi-step entailment trees, explaining the answers to ARC science questions
- An atlas of everyday commonsense reasoning, organized through 1.33M textual descriptions of inferential knowledge.Mosaic • 2021We present ATOMIC 2020, a commonsense knowledge graph with 1.33M everyday inferential knowledge tuples about entities and events. ATOMIC 2020 represents a large-scale common sense repository of textual descriptions that encode both the social and the physical aspects of common human everyday experiences, collected with the aim of being complementary to commonsense knowledge encoded in current language models. ATOMIC 2020 introduces 23 commonsense relations types. They can be broadly classified into three categorical types: 9 commonsense relations of social-interaction, 7 physical-entity commonsense relations, and 7 event-centered commonsense relations concerning situations surrounding a given event of interest.
- An atlas of everyday commonsense reasoning, organized through 877k textual descriptions of inferential knowledge.Mosaic • 2021We present ATOMIC, an atlas of everyday commonsense reasoning, organized through 877k textual descriptions of inferential knowledge. Compared to existing resources that center around taxonomic knowledge, ATOMIC focuses on inferential knowledge organized as typed if-then relations with variables (e.g., "if X pays Y a compliment, then Y will likely return the compliment").
- A commonsense reasoning benchmark spanning social and physical common senseMosaic • 2021Rainbow is a universal commonsense reasoning benchmark spanning both social and physical common sense. Rainbow brings together 6 existing commonsense reasoning tasks: aNLI, Cosmos QA, HellaSWAG, Physical IQa, Social IQa, and WinoGrande. Modelers are challenged to develop techniques which capture world knowledge that helps solve this broad suite of tasks.
- A corpus and benchmark for predicting communities' ethical judgments on real-life anecdotesMosaic • 2021Scruples is a corpus and benchmark for studying descriptive machine ethics, or machines' ability to understand people's ethical judgments. Scruples offers two datasets: the Anecdotes and the Dilemmas. The Anecdotes collect real-life experiences with ethical judgments about them, while the Dilemmas present pairs of simpler actions with crowdsourced judgments on which is less ethical.
- 2,780 implicit multi-hop reasoning questionsAI2 Israel, Question Understanding, Aristo • 2021StrategyQA is a question-answering benchmark focusing on open-domain questions where the required reasoning steps are implicit in the question and should be inferred using a strategy. StrategyQA includes 2,780 examples, each consisting of a strategy question, its decomposition, and evidence paragraphs.
- Updated RuleTaker datasets with 500k questions, answers and proofs over rulebases.Aristo • 2020These datasets accompany the paper "ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language". They contain updated RuleTaker-style datasets with 500k questions, answers and proofs over natural-language rulebases, used to show that Transformers can emulate reasoning over rules expressed in language, including proof generation. It includes variants using closed- and open-world semantics. Proofs include intermediate conclusions. Extra annotations provide data to train the iterative ProofWriter model as well as abductive reasoning to make uncertain statements certain.
- Datasets used to teach transformers to reasonAristo • 2020Can transformers be trained to reason (or emulate reasoning) over rules expressed in language? In the associated paper and demo we provide evidence that they can. Our models, that we call RuleTakers, are trained on datasets of synthetic rule bases plus derived conclusions, provided here. The resulting models provide the first demonstration that this kind of soft reasoning over language is indeed learnable.