Viewing 11-20 of 50 datasets
- 24K QA pairs over 4.7K paragraphs, split between train (19K QAs), development (2.4K QAs) and a hidden test partition (2.5K QAs).AllenNLP, AI2 Irvine • 2019Quoref is a QA dataset which tests the coreferential reasoning capability of reading comprehension systems. In this span-selection benchmark containing 24K questions over 4.7K paragraphs from Wikipedia, a system must resolve hard coreferences before selecting the appropriate span(s) in the paragraphs for answering questions.
- 14k QA pairs over 1.7K paragraphs, split between train (10k QAs), development (1.6k QAs) and a hidden test partition (1.7k QAs).AllenNLP, AI2 Irvine • 2019ROPES is a QA dataset which tests a system's ability to apply knowledge from a passage of text to a new situation. A system is presented a background passage containing a causal or qualitative relation(s), a novel situation that uses this background, and questions that require reasoning about effects of the relationships in the back-ground passage in the context of the situation.
- 9,980 8-way multiple-choice questions about grade school scienceAristo • 2019QASC is a question-answering dataset with a focus on sentence composition. It consists of 9,980 8-way multiple-choice questions about grade school science (8,134 train, 926 dev, 920 test), and comes with a corpus of 17M sentences.
- 3864 questions about open domain qualitative relationshipsAristo • 2019QuaRTz is a crowdsourced dataset of 3864 multiple-choice questions about open domain qualitative relationships. Each question is paired with one of 405 different background sentences (sometimes short paragraphs).
- 7,787 multiple choice questions annotated with question classification labelsAristo • 2019A dataset of detailed problem domain classification labels for each of the 7,787 multiple-choice science questions found in the AI2 Reasoning Challenge (ARC) dataset, to enable targeted pairing of questions with problem-specific solvers. Also included is a taxonomy of 462 detailed problem domains for grade-school science, organized into 6 levels of specificity.
- Large-scale dataset of 39705 "What if..." questions over procedural textAristo • 2019The WIQA dataset V1 has 39705 questions containing a perturbation and a possible effect in the context of a paragraph. The dataset is split into 29808 train questions, 6894 dev questions and 3003 test questions.
- 12,102 multiple-choice questions with one correct answer and four distractor answersAI2 Israel, Question Understanding • 2019CommonsenseQA is a new multiple-choice question answering dataset that requires different types of commonsense knowledge to predict the correct answers. It contains 12,102 questions with one correct answer and four distractor answers.
- The DROP dataset contains 96k QA pairs over 6.7K paragraphs, split between train (77k QAs), development (9.5k QAs) and a hidden test partition (9.5k QAs).AllenNLP, AI2 Irvine • 2019DROP is a QA dataset that tests the comprehensive understanding of paragraphs. In this crowdsourced, adversarially-created, 96k question-answering benchmark, a system must resolve multiple references in a question, map them onto a paragraph, and perform discrete operations over them (such as addition, counting, or sorting).
- A large dataset of citation intent classification based on citation textSemantic Scholar • 2019Citations play a unique role in scientific discourse and are crucial for understanding and analyzing
scientific work. However not all citations are equal. Some citations refer to use of a method from another work, some discuss results or findings of other work, while others are merely background or acknowledgement citations. SciCite is a dataset of 11K manually annotated citation intents based on citation context in the computer science and biomedical domains.
- 2771 story questions about qualitative relationshipsAristo • 2018QuaRel is a crowdsourced dataset of 2771 multiple-choice story questions, including their logical forms.