Allen Institute for AI

AI2 ProcessBank Data

Aristo • 2014
The dataset consists of 200 paragraphs that describe biological processes. Each paragraph is annotated with its process structure, and accompanied by a few multiple-choice questions about the process. Each question has two possible answers of which exactly one is correct. This dataset was used to train a system to automatically extract process models from paragraphs that describe processes.

Dataset contents

The dataset contains three files:

  1. bioprocess-bank-questions.tar.gz: There is an xml file for each paragraph containing the paragraph ID, the questions and answers.
  2. process-bank-structures-train.tar.gz: These are the structure annotations used for training our structure predictor. Each paragraph has two files - one containing the text and one containing the annotation. This is standard BRAT format.
  3. process-bank-structures-test.tar.gz: These are structure annotations used for testing. They are also in BRAT format.

Authors

This dataset was produced by Allen Institute for AI and Jonathan Berant (Stanford University).