OpenBookQA Dataset

Aristo • 2018

OpenBookQA aims to promote research in advanced question-answering, probing a deeper understanding of both the topic (with salient facts summarized as an open book, also provided with the dataset) and the language it is expressed in. In particular, it contains questions that require multi-step reasoning, use of additional common and commonsense knowledge, and rich text comprehension.

Download Read Paper View Repo

License: See Repo

OpenBookQA is a new kind of question-answering dataset modeled after open book exams for assessing human understanding of a subject. It consists of 5,957 multiple-choice elementary-level science questions (4,957 train, 500 dev, 500 test), which probe the understanding of a small “book” of 1,326 core science facts and the application of these facts to novel situations. For training, the dataset includes a mapping from each question to the core science fact it was designed to probe. Answering OpenBookQA questions requires additional broad common knowledge, not contained in the book. The questions, by design, are answered incorrectly by both a retrieval-based algorithm and a word co-occurrence algorithm. Strong neural baselines achieve around 50% on OpenBookQA, leaving a large gap to the 92% accuracy of crowd-workers.

Additionally, we provide 5,167 crowd-sourced common knowledge facts, and an expanded version of the train/dev/test questions where each question is associated with its originating core fact, a human accuracy score, a clarity score, and an anonymized crowd-worker ID (in the “Additional” folder).

Leaderboard

Top Public Submissions

Details	Created	Accuracy
1 Opus + Sentence Retrieval + Additional Context + CoT Aryaman Pattnayak (CraftBench.AI and Shiv Nadar University) and Vaishnav Varma(Shiv Nadar University)	4/25/2024	97%
2 GPT-4 + KB Liang Yao, from Tencent Inc.	11/3/2023	96%
3 MVP-Tuning Ensemble SenseTime & CUHK	11/21/2022	95%
4 X-Reasoner HFL & iFLYTEK Research	7/25/2022	94%
5 KnowGPT(GPT-3.5) Anonymous	5/22/2024	93%

View Leaderboard

Natural Language Processing

Computer Vision

AI for the Environment

Experimentation and Communication

Research

Research

OpenBookQA Dataset

Leaderboard