    • 27,026 statements

      The SciTail dataset is an entailment dataset created from multiple-choice science exams and web sentences. Each question and the correct answer choice are converted into an assertive statement to form the hypothesis.

    • 5,059 real science exam questions derived from a variety of regional and state science exams

      The AI2 Science Questions dataset consists of questions used in student assessments in the United States across elementary and middle school grade levels. Each question is 4-way multiple choice format and may or may not include a diagram element.

    • 13,679 science questions with supporting sentences

      The SciQ dataset contains 13,679 crowdsourced science exam questions about Physics, Chemistry and Biology, among others. The questions are in multiple-choice format with 4 answer options each. For the majority of the questions, an additional paragraph with supporting evidence for the correct answer is provided.

    • 156K sentences for 4th grade questions, 107K sentences for 8th grade questions, and derived tuples

      The TupleInf Open IE dataset contains Open IE tuples extracted from 263K sentences that were used by the solver in "Answering Complex Questions Using Open Information Extraction” (referred as Tuple KB, T). These sentences were collected from a large Web corpus using training questions from 4th and 8th grade as queries. This dataset contains 156K sentences collected for 4th grade questions and 107K sentences for 8th grade questions. Each sentence is followed by the Open IE v4 tuples using their simple format.

    • 9,356 science terms and sentences

      This is the dataset for the paper Leveraging Term Banks for Answering Complex Questions: A Case for Sparse Vectors. The dataset contains 9,356 science terms and, for each term, an average of 16,000 sentences that contain the term.

    • 1,076 textbook lessons, 26,260 questions, 6229 images

      The TextbookQuestionAnswering (TQA) dataset is drawn from middle school science curricula as described in Are You Smarter Than A Sixth Grader? Textbook Question Answering for Multimodal Machine Comprehension. It consists of 1,076 lessons from Life Science, Earth Science and Physical Science textbooks downloaded from Each lesson has a set of multiple choice questions that address concepts taught in that lesson. TQA has a total of 26,260 questions including 12,567 that have an accompanying diagram.

    • 294,000 science-relevant tuples

      The Aristo Tuple KB contains 294,000 high-precision, domain-targeted (subject,relation,object) tuples extracted from text using a high-precision extraction pipeline, and guided by domain vocabulary constraints.