Allen Institute for AI

AI2 Biology How/Why Corpus

Aristo • 2014
This dataset consists of 185 "how" and 193 "why" biology questions authored by a domain expert, with one or more gold answer passages identified in an undergraduate textbook.


The expert was not constrained in any way during the annotation process, so gold answers might be smaller than a paragraph or span multiple paragraphs.

This dataset was used for the question-answering system described in the paper “Discourse Complements Lexical Semantics for Non-factoid Answer Reranking” (ACL 2014).


This dataset was produced by Allen Institute for AI and Mihai Surdeanu (University of Arizona).