Datasets

Viewing 1-10 of 45 datasets
  • IfQA Counterfactual Reasoning Benchmark

    3,800 open-domain questions designed to assess counterfactual reasoning abilities of NLP modelsAristo • 2023Counterfactual reasoning benchmark introduced in the EMNLP-2023 paper titled "IfQA: A Dataset for Open-domain Question Answering under Counterfactual Presuppositions".
  • Digital Socrates

    DS Critique Bank contains annotated critiques of answers and explanations from "student" models.Aristo • 2023DS Critique Bank (DSCB) is a dataset of multiple-choice questions with associated answers and explanations provided by "student models", along with "critiques" of the explanations provided by "critique models". Many of the instances have human annotations.
  • ParRoT (Parts and Relations of Things)

    11,720 “X relation Y?” True/False questions on parts of everyday things and relational information about these partsAristo • 2023This is the dataset in "Do language models have coherent mental models of everyday things?", ACL 2023.
  • Belief and Reasoning Dataset

    BaRDA: A Belief and REasoning Dataset that Separates Factual Accuracy and Reasoning AbilityAristo • 2023BaRDa is a new belief and reasoning dataset for evaluating the factual correctness ("truth") and reasoning accuracy ("rationality", or "honesty") of new language models. It was created in collaboration with, and with the support of, the Open Philanthropy…
  • Lila

    A math reasoning benchmark of over 140K natural language questions annotated with Python programsAristo • 2022A comprehensive benchmark for mathematical reasoning with over 140K natural language questions annotated with Python programs and natural language instructions. The data set comes with multiple splits: Lila-IID (train, dev, test), Lila-OOD (train, dev, test…
  • Entailer

    Data for "Entailer: Answering Questions with Faithful and Truthful Chains of Reasoning", EMNLP 2022Aristo • 2022Data for "Entailer: Answering Questions with Faithful and Truthful Chains of Reasoning", EMNLP 2022
  • TeachMe

    Supplementary data for "Towards Teachable Reasoning Systems: Using a Dynamic Memory ...", EMNLP 2022Aristo • 2022Supplementary data for "Towards Teachable Reasoning Systems: Using a Dynamic Memory ...", EMNLP 2022
  • Multihop Questions via Single-hop Question Composition

    Multihop reading comprehension dataset with 2-4 hop questions.Aristo • 2022MuSiQue is a multihop reading comprehension dataset with 2-4 hop questions, built by composing seed questions from 5 existing single-hop datasets. The dataset is constructed with a bottom-up approach that systematically selects composable pairs of single-hop…
  • The Fermi Challenge

    A challenge dataset of Fermi (estimation) problems, currently beyond the capabilities of modern methods.Aristo • 2021A challenge dataset of Fermi (estimation) problems, currently beyond the capabilities of modern methods.
  • BeliefBank

    4998 facts and 12147 constraints to test a model's consistencyAristo • 2021Dataset of 4998 simple facts and 12147 constraints to test, and improve, a model's accuracy and consistency