Datasets
Viewing 1-10 of 41 datasets
Lila
A math reasoning benchmark of over 140K natural language questions annotated with Python programsAristo • 2022A comprehensive benchmark for mathematical reasoning with over 140K natural language questions annotated with Python programs and natural language instructions. The data set comes with multiple splits: Lila-IID (train, dev, test), Lila-OOD (train, dev, test…Entailer
Data for "Entailer: Answering Questions with Faithful and Truthful Chains of Reasoning", EMNLP 2022Aristo • 2022Data for "Entailer: Answering Questions with Faithful and Truthful Chains of Reasoning", EMNLP 2022TeachMe
Supplementary data for "Towards Teachable Reasoning Systems: Using a Dynamic Memory ...", EMNLP 2022Aristo • 2022Supplementary data for "Towards Teachable Reasoning Systems: Using a Dynamic Memory ...", EMNLP 2022Multihop Questions via Single-hop Question Composition
Multihop reading comprehension dataset with 2-4 hop questions.Aristo • 2022MuSiQue is a multihop reading comprehension dataset with 2-4 hop questions, built by composing seed questions from 5 existing single-hop datasets. The dataset is constructed with a bottom-up approach that systematically selects composable pairs of single-hop…The Fermi Challenge
A challenge dataset of Fermi (estimation) problems, currently beyond the capabilities of modern methods.Aristo • 2021A challenge dataset of Fermi (estimation) problems, currently beyond the capabilities of modern methods.BeliefBank
4998 facts and 12147 constraints to test a model's consistencyAristo • 2021Dataset of 4998 simple facts and 12147 constraints to test, and improve, a model's accuracy and consistencyEntailmentBank
2k multi-step entailment trees, explaining the answers to ARC science questionsAristo • 20212k multi-step entailment trees, explaining the answers to ARC science questionsStrategyQA
2,780 implicit multi-hop reasoning questionsAI2 Israel, Question Understanding, Aristo • 2021StrategyQA is a question-answering benchmark focusing on open-domain questions where the required reasoning steps are implicit in the question and should be inferred using a strategy. StrategyQA includes 2,780 examples, each consisting of a strategy question…ProofWriter
Updated RuleTaker datasets with 500k questions, answers and proofs over rulebases.Aristo • 2020These datasets accompany the paper "ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language". They contain updated RuleTaker-style datasets with 500k questions, answers and proofs over natural-language rulebases, used to…RuleTaker: Transformers as Soft Reasoners over Language
Datasets used to teach transformers to reasonAristo • 2020Can transformers be trained to reason (or emulate reasoning) over rules expressed in language? In the associated paper and demo we provide evidence that they can. Our models, that we call RuleTakers, are trained on datasets of synthetic rule bases plus…