Datasets

Viewing 1-10 of 82 datasets
  • Belief and Reasoning Dataset

    BaRDA: A Belief and REasoning Dataset that Separates Factual Accuracy and Reasoning AbilityAristo • 2023BaRDa is a new belief and reasoning dataset for evaluating the factual correctness ("truth") and reasoning accuracy ("rationality", or "honesty") of new language models. It was created in collaboration with, and with the support of, the Open Philanthropy…
  • Lila

    A math reasoning benchmark of over 140K natural language questions annotated with Python programsAristo • 2022A comprehensive benchmark for mathematical reasoning with over 140K natural language questions annotated with Python programs and natural language instructions. The data set comes with multiple splits: Lila-IID (train, dev, test), Lila-OOD (train, dev, test…
  • WANLI: Worker-and-AI NLI

    An NLI dataset created via a collaborative approach between language models and crowdworkers2022WANLI is an NLI dataset of 108K examples created through a novel approach for dataset creation based on worker and AI collaboration, which brings together the generative strength of language models and the evaluative strength of humans. Models trained on…
  • Entailer

    Data for "Entailer: Answering Questions with Faithful and Truthful Chains of Reasoning", EMNLP 2022Aristo • 2022Data for "Entailer: Answering Questions with Faithful and Truthful Chains of Reasoning", EMNLP 2022
  • TeachMe

    Supplementary data for "Towards Teachable Reasoning Systems: Using a Dynamic Memory ...", EMNLP 2022Aristo • 2022Supplementary data for "Towards Teachable Reasoning Systems: Using a Dynamic Memory ...", EMNLP 2022
  • Natural Instructions

    A large benchmark of tasks and their language instructions 2022The goal of Natural-Instructions project is to provide a good quality benchmark for measuring generalization to unseen tasks. This generalization hinges upon (and benefits from) understanding and reasoning with natural language instructions that plainly and…
  • Multihop Questions via Single-hop Question Composition

    Multihop reading comprehension dataset with 2-4 hop questions.Aristo • 2022MuSiQue is a multihop reading comprehension dataset with 2-4 hop questions, built by composing seed questions from 5 existing single-hop datasets. The dataset is constructed with a bottom-up approach that systematically selects composable pairs of single-hop…
  • Drug Combinations Dataset

    A Dataset for N-ary Relation Extraction of Drug CombinationsAI2 Israel • 2022Combination therapies have become the standard of care for diseases such as cancer, tuberculosis, malaria and HIV. However, the combinatorial set of available multi-drug treatments creates a challenge in identifying effective combination therapies available…
  • S2AMP: A High-Coverage Dataset of Scholarly Mentorship Inferred from Publications

    A dataset to study mentorship relationships in academia and corporate research labsSemantic Scholar • 2022Mentorship is a critical component of academia, but is not as visible as publications, citations, grants, and awards. Despite the importance of studying the quality and impact of mentorship, there are few large representative mentorship datasets available. We…
  • NumGLUE

    NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks2022Given the ubiquitous nature of numbers in text, reasoning with numbers to perform simple calculations is an important skill of AI systems. While many datasets and models have been developed to this end, state-of-the-art AI systems are brittle; failing to…