Datasets

Viewing 1-10 of 86 datasets
  • IfQA Counterfactual Reasoning Benchmark

    3,800 open-domain questions designed to assess counterfactual reasoning abilities of NLP modelsAristo • 2023Counterfactual reasoning benchmark introduced in the EMNLP-2023 paper titled "IfQA: A Dataset for Open-domain Question Answering under Counterfactual Presuppositions".
  • Digital Socrates

    DS Critique Bank contains annotated critiques of answers and explanations from "student" models.Aristo • 2023DS Critique Bank (DSCB) is a dataset of multiple-choice questions with associated answers and explanations provided by "student models", along with "critiques" of the explanations provided by "critique models". Many of the instances have human annotations.
  • Satlas Explorer

    Satlas Explorer applies ML on satellite imagery to derive a wide range of geospatial data.2023Satlas Explorer is a demonstration of the use of AI to extract a variety of interesting data from satellite imagery, which can provide a near-real-time understanding of how our planet is changing. The current release contains predictions for: (1) the…
  • ParRoT (Parts and Relations of Things)

    11,720 “X relation Y?” True/False questions on parts of everyday things and relational information about these partsAristo • 2023This is the dataset in "Do language models have coherent mental models of everyday things?", ACL 2023.
  • Belief and Reasoning Dataset

    BaRDA: A Belief and REasoning Dataset that Separates Factual Accuracy and Reasoning AbilityAristo • 2023BaRDa is a new belief and reasoning dataset for evaluating the factual correctness ("truth") and reasoning accuracy ("rationality", or "honesty") of new language models. It was created in collaboration with, and with the support of, the Open Philanthropy…
  • Lila

    A math reasoning benchmark of over 140K natural language questions annotated with Python programsAristo • 2022A comprehensive benchmark for mathematical reasoning with over 140K natural language questions annotated with Python programs and natural language instructions. The data set comes with multiple splits: Lila-IID (train, dev, test), Lila-OOD (train, dev, test…
  • WANLI: Worker-and-AI NLI

    An NLI dataset created via a collaborative approach between language models and crowdworkers2022WANLI is an NLI dataset of 108K examples created through a novel approach for dataset creation based on worker and AI collaboration, which brings together the generative strength of language models and the evaluative strength of humans. Models trained on…
  • Entailer

    Data for "Entailer: Answering Questions with Faithful and Truthful Chains of Reasoning", EMNLP 2022Aristo • 2022Data for "Entailer: Answering Questions with Faithful and Truthful Chains of Reasoning", EMNLP 2022
  • TeachMe

    Supplementary data for "Towards Teachable Reasoning Systems: Using a Dynamic Memory ...", EMNLP 2022Aristo • 2022Supplementary data for "Towards Teachable Reasoning Systems: Using a Dynamic Memory ...", EMNLP 2022
  • Natural Instructions

    A large benchmark of tasks and their language instructions 2022The goal of Natural-Instructions project is to provide a good quality benchmark for measuring generalization to unseen tasks. This generalization hinges upon (and benefits from) understanding and reasoning with natural language instructions that plainly and…