Allen Institute for AI

Open PI dataset

Aristo, Mosaic • 2020
We present the first dataset for tracking state changes in procedural text from arbitrary do-mains by using an unrestricted (open) vocabulary. For example, in a text describing fog removal using potatoes, a car window may transition between being foggy, sticky, opaque, and clear. Previous formulations of this task provide the text and entities involved, and ask how those entities change for just a small, pre-defined set of attributes (e.g., location), limiting their fidelity. Our solution is a new task formulation in which just the text is provided, from which a set of state changes (entity, at-tribute, before, after) is generated for each step, where the entity, attribute, and values must all be predicted from an open vocabulary. Using crowdsourcing, we create OPENPI, a high-quality (vetted by humans), and large-scale dataset comprising 33K state changes over 4,050 sentences from 810 procedural real-world paragraphs from An state-of-the-art generation model on this task achieves 18% F1 based on BLEU, leaving a lot of room for novel model architectures.
License: CC BY

Watch a video with slides about this work.

Some examples: coming soon.


Niket Tandon, Keisuke Sakaguchi, Bhavana Dalvi Mishra, Dheeraj Rajagopal, Peter Clark, Michal Guerqin, Kyle Richardson, Ed Hovy