This datast was produced as part of a project on Conversational Dialog Questions (see paper), dialog traces, and extractions from KnowBot, an experimental dialog system that learns about its domain from conversational dialogs with the user.
The following resources are included in this archive:
107qns.txt: A flat-text file containing the 107 questions. (Note that other versions of Aristo use different question sets.)
aristo_9_4_2014.sqlite: The sql database of the questions + extractions used as backend knowledge. This file is loadable directly in python with the sqlite library, or browsable with for example the Firefox SQLite Manager plugin.
The dialogs used to compute the results in Table 2. These are divided into three directories: Aristo1: corresponds to the initial system using only the IQE keyword strategy Aristo2: uses the user-initiative strategy Aristo3: uses the mixed-initiative strategy Each dialog is paired with a json file that reflects the extractions and system decisions made during a particular dialog (these files are not intended to be human-readable but the stouthearted can contact me for formatting details). This batch includes empty dialogs (in which the user immediately ends the dialog, often corresponding to page reloads) for completeness’ sake.
knowledge.json: The aggregate file containing all extractions from the above dialogs used to build Table 2. The initial json keys are the question id’s (‘qid’). Each qid has the following subkeys: ‘qtext’: the question text ‘relations’: the relations extracted for each question as described in the paper. the first argument is called ‘source’ and the second argument is called ‘target’ (an artifact of the knowledge graph format). ‘utterances’: the utterances in which the relations originate, organized by relation ‘ngrams’: the unigrams and bigrams of words in the utterances, keyed by relation. Useful feature describing the relation. ‘intexts’: the set of text spanning in between the argument keywords over all utterances, keyed by relation. Can act as a useful feature.
knowledge_visualizations.html: The d3 visualization to see the knowledge graphs per question. This also acts as an example of using knowledge.json. Example visualizations can be found in the file example_visualizations.pdf
python -m SimpleHTTPServer 8888 and then navigate to
http://localhost:8888/ in your browser, then just navigate to the directory with the file.
This dataset was produced by Allen Institute for AI as part of Ben Hixon’s internship project.