Allen Institute for AI

Datasets

Viewing 41-47 of 47 datasets
  • AI2 Conversational Dialog Traces

    81 dialog traces and extractionsAristo • 2015This dataset contains files for the paper "Learning knowledge graphs for question answering through conversational dialog", presented at the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2015), Denver, Colorado. May 31 - June 5, 2015.
  • MLNs for QA, Data: Markov Logic Networks for Natural Language Question Answering

    Evaluations for 108 real science exam questionsAristo • 2015This work explores the use of Markov Logic Networks (MLNs) for answering elementary-level natural language science questions. The dataset contains the MLNs generated from three different formulations along with a README describing the format.
  • AI2 Geometry Questions

    100 geometry questions2014These questions guide our research into Question Answering for geometry exams. Focus is on the high school level.
  • AI2 Arithmetic Questions

    391 arithmetic questions2014These questions guide our research into Question Answering for arithmetic exams. Focus is on high school level questions.
  • AI2 Biology How/Why Corpus

    378 biology questionsAristo • 2014This dataset consists of 185 "how" and 193 "why" biology questions authored by a domain expert, with one or more gold answer passages identified in an undergraduate textbook.
  • AI2 Meaningful Citations Data Set

    630 paper annotationsSemantic Scholar • 2014This dataset is comprised of annotations for 465 computer science papers. The annotations indicate whether a citation is important (i.e., refers to ongoing or continued work on the relevant topic) or not and then assigns the citation one of four importance rankings.
  • AI2 ProcessBank Data

    200 annotated paragraphs about biological processesAristo • 2014The dataset consists of 200 paragraphs that describe biological processes. Each paragraph is annotated with its process structure, and accompanied by a few multiple-choice questions about the process. Each question has two possible answers of which exactly one is correct. This dataset was used to train a system to automatically extract process models from paragraphs that describe processes.