The Aristo Tuple KB contains a collection of high-precision, domain-targeted (subject,relation,object) tuples extracted from text using a high-precision extraction pipeline, and guided by domain vocabulary constraints.
Download the dataset and text corpus:
- Aristo Tuple KB v5 (March 2017): 282,594 science-relevant tuples (TACL 2017 data is included in this dataset)
- Aristo Mini Corpus: Text corpus used for measuring "comprehensiveness"
Domain-Targeted, High Precision Knowledge Extraction
Our goal is to construct a domain-targeted, high precision knowledge base (KB), containing general (subject,predicate,object) statements about the world, in support of a downstream question-answering (QA) application. Despite recent advances in information extraction (IE) techniques, no suitable resource for our task already exists; existing resources are either too noisy, too named-entity centric, or too incomplete, and typically have not been constructed with a clear scope or purpose. To address these, we have created a domaintargeted, high precision knowledge extraction pipeline, leveraging Open IE, crowdsourcing, and a novel canonical schema learning algorithm (called CASI), that produces high precision knowledge targeted to a particular domain - in our case, elementary science. To measure the KB’s coverage of the target domain's knowledge (its "comprehensiveness" with respect to science) we measure recall with respect to an independent corpus of domain text, and show that our pipeline produces output with over 80% precision and 23% recall with respect to that target, a substantially higher coverage of tuple-expressible science knowledge than other comparable resources. We have made the KB publicly available. Less
If you use this data in your research please refer to the tuple KB by its release name and date ("Aristo Tuple KB v5 - Mar 2017 Release"), and provide an acknowledgement to AI2 (www.allenai.org).
Previous releases of the Aristo Tuple KB may be found below:
- Aristo Tuple KB v4 (March 2017) (NB: this is is the first of two March 2017 releases)
- Aristo Tuple KB v3 (January 2017)
- Aristo Tuple KB v2 (December 2016)
- Aristo Tuple KB v1 (November 2016)
If you have any other questions or feedback for us about this data, please contact email@example.com.