AI2 Conversational Dialog Traces

Aristo • 2015
This dataset contains files for the paper "Learning knowledge graphs for question answering through conversational dialog", presented at the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2015), Denver, Colorado. May 31 - June 5, 2015.
License: See Repo

This datast was produced as part of a project on Conversational Dialog Questions (see paper), dialog traces, and extractions from KnowBot, an experimental dialog system that learns about its domain from conversational dialogs with the user.

The following resources are included in this archive:

  1. 107qns.txt: A flat-text file containing the 107 questions. (Note that other versions of Aristo use different question sets.)

  2. aristo_9_4_2014.sqlite: The sql database of the questions + extractions used as backend knowledge. This file is loadable directly in python with the sqlite library, or browsable with for example the Firefox SQLite Manager plugin.

  3. The dialogs used to compute the results in Table 2. These are divided into three directories: Aristo1: corresponds to the initial system using only the IQE keyword strategy Aristo2: uses the user-initiative strategy Aristo3: uses the mixed-initiative strategy Each dialog is paired with a json file that reflects the extractions and system decisions made during a particular dialog (these files are not intended to be human-readable but the stouthearted can contact me for formatting details). This batch includes empty dialogs (in which the user immediately ends the dialog, often corresponding to page reloads) for completeness’ sake.

  4. knowledge.json: The aggregate file containing all extractions from the above dialogs used to build Table 2. The initial json keys are the question id’s (‘qid’). Each qid has the following subkeys: ‘qtext’: the question text ‘relations’: the relations extracted for each question as described in the paper. the first argument is called ‘source’ and the second argument is called ‘target’ (an artifact of the knowledge graph format). ‘utterances’: the utterances in which the relations originate, organized by relation ‘ngrams’: the unigrams and bigrams of words in the utterances, keyed by relation. Useful feature describing the relation. ‘intexts’: the set of text spanning in between the argument keywords over all utterances, keyed by relation. Can act as a useful feature.

  5. knowledge_visualizations.html: The d3 visualization to see the knowledge graphs per question. This also acts as an example of using knowledge.json. Example visualizations can be found in the file example_visualizations.pdf

Note: this tool uses d3 (a terrific javascript visualization library, see and assumes both ‘knowledge.json’ as well as the d3 library ‘d3.v3.min.js’ (also included in this archive) are located in the same directory. It can’t be opened in a browser as-is without dangerously changing local security permissions, and the easiest way I know to open it is to first run a simple local server from the terminal with the python command python -m SimpleHTTPServer 8888 and then navigate to http://localhost:8888/ in your browser, then just navigate to the directory with the file.


This dataset was produced by Allen Institute for AI as part of Ben Hixon’s internship project.