Menu
Viewing 9 data from 2014
Clear all
    • 404 "odd-man-out" puzzles

      This collection consists of four sets of "odd-man-out" puzzles. There are two collections of "common noun" puzzles, where the answer options are largely common nouns, and two collections of "proper noun" puzzles, where the answers options are largely proper nouns. Each collection contains approximately 100 puzzles. The categories are taken from the card game Anomia, which was used to drive the puzzle generation process.

    • 100 geometry questions

      These questions guide our research into Question Answering for geometry exams. Focus is on the high school level. Example (note: diagrams included in data file): "In circle O, diameter AB is perpendicular to chord CD at E. If CD = 8 and BE = 2, find AE." This dataset was produced by AI2 and the University of Washington.

    • 391 arithmetic questions

      These questions guide our research into Question Answering for arithmetic exams. Focus is on high school level questions. Example: "Sandy has 10 books, Benny has 24 books, and Tim has 33 books. How many books do they have together?". This dataset was produced by AI2 and Hannaneh Hajishirzi (University of Washington).

    • 378 biology questions

      This dataset consists of 185 "how" and 193 "why" biology questions authored by a domain expert, with one or more gold answer passages identified in an undergraduate textbook. The expert was not constrained in any way during the annotation process, so gold answers might be smaller than a paragraph or span multiple paragraphs. This dataset was used for the question-answering system described in "Discourse Complements Lexical Semantics for Non-factoid Answer Reranking" (ACL 2014). This dataset was produced by AI2 and Mihai Surdeanu (University of Arizona).

    • Analysis of three co-reference types

      An understanding of co-reference (i.e. multiple references to the same thing) is necessary in order to understand the meaning of a text. This dataset is an analysis of co-reference types occurring in 4th-grade biology textbooks. This analysis was based on the New York State Education Department's Grade 4 Elementary-Level Science Test (accessed July 2014).

    • 630 paper annotations

      This dataset is comprised of annotations for 465 computer science papers. The annotations indicate whether a citation is important (i.e., refers to ongoing or continued work on the relevant topic) or not and then assigns the citation one of four importance rankings. This data set was produced at AI2 as part of intern Marco Valenzuela's work for his paper, "Identifying Meaningful Citations".

    • 33 paraphrases

      Vocabulary used in questions may differ from that of sources contributing to our Question Answering knowledge base. Relevant paraphrases like these help the QA system understand connections between question vocabulary and knowledge base vocabulary. This dataset is an analysis of PPDB paraphrases relevant to 4th-grade biology exams done by AI2 intern Ellie Pavlick.

    • 2600 open-source artificial intelligence resources

      Open AI Resources is a directory of open source software and data for the AI research community. The site was initially developed by AI2 and InferLink Corporation, and is currently managed by the AI Access Foundation.

    • 200 annotated paragraphs about biological processes

      This dataset was used to train a system to automatically extract process models from paragraphs that describe processes. The dataset consists of 200 paragraphs that describe biological processes. Each paragraph is annotated with its process structure, and accompanied by a few multiple-choice questions about the process. Each question has two possible answers of which exactly one is correct. The dataset contains three files:
      1. bioprocess-bank-questions.tar.gz: There is an xml file for each paragraph containing the paragraph ID, the questions and answers.
      2. process-bank-structures-train.tar.gz: These are the structure annotations used for training our structure predictor. Each paragraph has two files - one containing the text and one containing the annotation. This is standard BRAT format.
      3. process-bank-structures-test.tar.gz: These are structure annotations used for testing. They are also in BRAT format.
      The dataset was produced by AI2 and Jonathan Berant (Stanford University).