AllenNLP, Semantic Scholar • 2021
A dataset containing 1585 papers with 5049 information-seeking questions asked by regular readers of NLP papers, and answered by a separate set of NLP practitioners.
License: CC BY

Current version: 0.2

Clicking Download will provide a link to download the training and development sets of the latest version of the dataset in JSON format. The files only contain text. You can download images of the tables and figures in the papers from the link below.

Images of tables and figures in train and development sets (the size is about 450MB).

Test set and official evaluator

Once you are ready to evaluate your finalized model on the test set, use the following links to download the test set

Older versions

Version 0.1 of the dataset did not contain the images of figures and tables. If you need the older version for some reason you can access it here:


Pradeep Dasigi, Kyle Lo, Iz Beltagy, Arman Cohan, Noah A. Smith, Matt Gardner