Menu
Viewing 7 data from 2016
Clear all
    • 1,197,377 science-relevant sentences

      The Aristo Mini corpus contains 1,197,377 (very loosely) science-relevant sentences drawn from public data. It provides simple science-relevant text that may be useful to help answer elementary science questions. It is used in the Aristo Mini system and is also available here as a resource in its own right.

    • 6,952 real science exam questions derived from a variety of item banks

      The AI2 Science Questions Mercury dataset consists of questions used in student assessments across elementary and middle school grade levels, provided under license by an AI2 research partner.

    • 1,363 gold explanation sentences

      This is the dataset for the paper What's in an Explanation? Characterizing Knowledge and Inference Requirements for Elementary Science Exams (COLING'16). The data contains: gold explanation sentences supporting 363 science questions, relation annotation for a subset of those explanations, and a graphical annotation tool with annotation guidelines. This dataset was produced by AI2, the University of Arizona, and Stony Brook University.

    • 4,817 images

      AI2D is a dataset of illustrative diagrams for research on diagram understanding and associated question answering.

    • 1,080 questions

      These questions were created using the "AI2 Elementary School Science Questions (No Diagrams)" data set by changing all of the incorrect answer options of each question with some other related word. This dataset can be a good measure of robustness for QA systems when being testing on modified questions. More details can be found in the paper Question Answering via Integer Programming over Semi-Structured Knowledge.

    • 9,850 videos

      This dataset guides our research into unstructured video activity recogntion and commonsense reasoning for daily human activities. These videos of daily indoors activities were collected through Amazon Mechanical Turk.

    • 9092 crowd-sourced science questions and 68 tables of curated facts

      This package contains a copy of the Aristo Tablestore (Nov. 2015 Snapshot), plus a large set of crowd-sourced multiple-choice questions covering the facts in the tables. Through the setup of the crowd-sourced annotation task, the package also contains implicit alignment information between questions and tables. For further information, see "TabMCQ: A Dataset of General Knowledge Tables and Multiple-choice Questions" (PDF included in this package). This dataset was produced by AI2 and Sujay Kumar Jauhar (Carnegie Mellon University).