Allen Institute for AI

QuaRTz Dataset

Aristo • 2019
QuaRTz is a crowdsourced dataset of 3864 multiple-choice questions about open domain qualitative relationships. Each question is paired with one of 405 different background sentences (sometimes short paragraphs).

The QuaRTz dataset V1 contains 3864 questions about open domain qualitative relationships. Each question is paired with one of 405 different background sentences (sometimes short paragraphs).

The dataset is split into train (2696), dev (384) and test (784). A background sentence will only appear in a single split.

Each line in a dataset file is a question specified as a json object, e.g., (with extra whitespace for readability):

{
  "para_id":"QRSent-10360",
  "id":"QRQA-10360-5-flip",
  "para":"A sunscreen with a higher SPF protects the skin longer.",
  "question":{
    "stem":"John was looking at sunscreen at the retail store. He noticed that sunscreens that had lower SPF would offer protection that is", 
    "choices":[{"label":"A","text":"Longer"},{"label":"B","text":"Shorter"}]},
  "answerKey":"B",
  "para_anno":{
    "effect_dir_sign":"MORE", 
    "cause_dir_sign":"MORE",
    "effect_prop":"protection",
    "cause_prop":"SPF",
    "cause_dir_str":"higher",
    "effect_dir_str":"longer"},
  "question_anno":{
    "more_effect_dir":"longer",
    "less_effect_dir":"Shorter",
    "less_cause_prop":"spf",
    "more_effect_prop":"protection",
    "less_cause_dir":"lower",
    "less_effect_prop":"protection"}
}

Explanation for each json field:

  • id: Unique question id, ends with “-flip” if it’s a “flipped” version of an original question
  • para_id: Unique background sentence id
  • para: The text of the associated background sentence (paragraph)
  • question: Contains the question stem and answer choices
  • answerKey: The label corresponding to the correct answer
  • para_anno: Annotations related to the background sentence:
    • cause_dir_sign: MORE or LESS indicating the direction of change for the “cause”
    • cause_prop: Surface form associated with the cause
    • cause_dir: Surface form associated with the change direction
    • effect_*: Same for cause -> effect for the effect property
  • question_anno: Annotations related to the question
    • more_cause_dir: Surface form (if any) associated with the direction of chance for the cause property, in the direction of “MORE”
    • less_cause_dir: Same, but for direction “LESS”
    • more_cause_prop: Same, but for associated property rather than direction
    • *_effect_*: Same for cause -> effect for the effect property