This 1.1 version fixes some content and formatting issues with answers from the original release.
These questions were derived from the ARC multiple-choice question set released as part of the AI2 Reasoning Challenge in 2018. The ARC Easy and ARC Challenge set questions in the original dataset were combined and then filtered/modified by the following process:
Turking: Each of the multiple-choice questions was presented as a direct answer question to five crowdsourced workers to gather additional answers.
Heuristic filtering: The questions were filtered based on the following heuristic filters:
Further manual vetting: We had volunteers in house do another pass of vetting where they:
The dataset consists of 2,985 questions in JSONL format, with the following split:
Each question is structured in JSON like this:
{
"question_id": "ARCEZ_Mercury_7221148",
"tag": "EASY-TRAIN",
"question": "A baby kit fox grows to become an adult with a mass of over 3.5 kg. What factor will have the greatest influence on this kit fox's survival?",
"answers": [
"food availability",
"larger predators prevalence",
"the availability of food",
"the population of predator in the area",
"food sources",
"habitat",
"availability of food",
"amount of predators around",
"how smart the fox is"
]
}
For each field…
Details | Created | BLEURT |
---|---|---|
1 UnifiedQA + ARC MC/DA + IR Aristo team at Allen Institute for AI | 1/13/2021 | 29% |
2 UnifiedQA + ARC-DA + IR Aristo team at Allen Institute for AI | 1/13/2021 | 28% |
3 UnifiedQA + ARC-DA (no IR) Aristo team at Allen Institute for AI | 1/13/2021 | 19% |
4 UL Test (no IR) Google Research | 4/5/2022 | 20% |
5 T5 (11B) GENIE team | 12/22/2020 | 6% |