  • Web10K Dataset

    38,176 queries and corresponding 1M+ images returned by Bing Image SearchPRIOR • 2022Web10K is a dataset sourced from web image search data with over 10K concepts. It consists of 38,176 queries and the corresponding 1M+ images returned by Bing Image Search. Web10K provides dense coverage of feasible adjective-noun and verb-noun combinations…
  • The Fermi Challenge

    A challenge dataset of Fermi (estimation) problems, currently beyond the capabilities of modern methods.Aristo • 2021A challenge dataset of Fermi (estimation) problems, currently beyond the capabilities of modern methods.
  • Qasper

    Question Answering on Research PapersAllenNLP, Semantic Scholar • 2021A dataset containing 1585 papers with 5049 information-seeking questions asked by regular readers of NLP papers, and answered by a separate set of NLP practitioners.
  • BeliefBank

    4998 facts and 12147 constraints to test a model's consistencyAristo • 2021Dataset of 4998 simple facts and 12147 constraints to test, and improve, a model's accuracy and consistency
  • EntailmentBank

    2k multi-step entailment trees, explaining the answers to ARC science questionsAristo • 20212k multi-step entailment trees, explaining the answers to ARC science questions
  • S2AND: Semantic Scholar's Author Disambiguation Algorithm & Evaluation Suite

    A dataset to train and evaluate models that do author disambiguation aka figuring out who wrote which paperSemantic Scholar • 2021A unified benchmark dataset for AND on scholarly papers, as well as an open-source reference model implementation. Our dataset harmonizes eight disparate AND datasets into a uniform format, with a single rich feature set drawn from the Semantic Scholar (S2…
  • ATOMIC 2020

    An atlas of everyday commonsense reasoning, organized through 1.33M textual descriptions of inferential knowledge.Mosaic • 2021We present ATOMIC 2020, a commonsense knowledge graph with 1.33M everyday inferential knowledge tuples about entities and events. ATOMIC 2020 represents a large-scale common sense repository of textual descriptions that encode both the social and the physical…
  • Rainbow: A Commonsense Reasoning Benchmark

    A commonsense reasoning benchmark spanning social and physical common senseMosaic • 2021Rainbow is a universal commonsense reasoning benchmark spanning both social and physical common sense. Rainbow brings together 6 existing commonsense reasoning tasks: aNLI, Cosmos QA, HellaSWAG, Physical IQa, Social IQa, and WinoGrande. Modelers are…
  • Scruples: Subreddit Corpus Requiring Understanding Principles in Life-like Ethical Situations

    A corpus and benchmark for predicting communities' ethical judgments on real-life anecdotesMosaic • 2021Scruples is a corpus and benchmark for studying descriptive machine ethics, or machines' ability to understand people's ethical judgments. Scruples offers two datasets: the Anecdotes and the Dilemmas. The Anecdotes collect real-life experiences with ethical…
  • StrategyQA

    2,780 implicit multi-hop reasoning questionsAI2 Israel, Question Understanding, Aristo • 2021StrategyQA is a question-answering benchmark focusing on open-domain questions where the required reasoning steps are implicit in the question and should be inferred using a strategy. StrategyQA includes 2,780 examples, each consisting of a strategy question…