Allen Institute for AI

What-If Question Answering

Aristo • 2019
The WIQA dataset V1 has 39705 questions containing a perturbation and a possible effect in the context of a paragraph. The dataset is split into 29808 train questions, 6894 dev questions and 3003 test questions.

Download the dataset without explanations, using which the models in the EMNLP 2019 submission were trained and tested (human accuracy on these questions was found to be 96.30). (Subsequently, we performed manual vetting on the test set and removed 990 questions that were somewhat noisy– the resulting dataset version (v2) can be found here).

This dataset is built using 2107 influence graphs.

You can also download the dataset with explanations. We have not yet experimented with this, but it’s available for you to use. This dataset has 30499 questions with explanations: 23023 train questions, 5005 dev questions, and 2471 test questions. We removed questions where the turked explanation was potentially incorrect. The human accuracy on these questions was found to be 81.63


Each line in a dataset file is a question specified as a JSON object. The no-explanation dataset used in EMNLP 2019 contains empty entry for the key explanation below.

  "question": {
    "stem": "suppose squirrels get sick happens, how will it affect squirrel population.",
    "para_steps": [
      "A male and female rabbit mate.",
      "The female rabbit becomes pregnant.",
      "Baby rabbits form inside of the mother rabbit.",
      "The female rabbit gives birth to a litter.",
      "The newborn rabbits grow up to become adults.",
      "The adult rabbits find mates."
    "answer_label": "less",
    "answer_label_as_choice": "B",
    "choices": [
      { "label": "A", "text": "more" },
      { "label": "B", "text": "less" },
      { "label": "C", "text": "no effect" }
  "explanation": {},
  "metadata": {
    "ques_id": "influence_graph:1310:156:83#3",
    "graph_id": "156",
    "para_id": "1310",
    "question_type": "OUT_OF_PARA"

Explanation for each JSON field:

  • question.stem: The question that comprises of a pertubation and a likely effect
  • question.para_steps: The text of the associated ProPara procedural text (consisting of up to 10 steps)
  • answer_label: The answer label value
  • answer_label_as_choice: The label corresponding to the correct answer
  • choices: Answer choices:
    • explanation: This key is empty for no-explanations dataset.
    • metadata.ques_id: The unique question id
    • metadata.graph_id: The influence graph id that was used to generate this question automatically.
    • metadata.para_id: The ProPara paragraph id corresponding to the para_steps.
    • metadata.question_type: The perturbations are either linguistic variation of a passage sentence (these are called in-para questions) or require commonsense reasoning to connect to a passage sentence (called, out- of-para questions) or distractor implying that the perturbation has no impact on the effect.

Influence Graphs

Influence graph for a paragraph from the topic evaporation (paragraph 13):

Influence graph for a paragraph from the topic flashlight (paragraph 451):

Influence graph for a paragraph from the topic lungs (paragraph 16):

Influence graph for a paragraph from the topic minerals (paragraph 5):