Discrete Reasoning Over the content of Paragraphs (DROP)
AllenNLP, AI2 Irvine • 2019
A lot of diverse reading comprehension datasets have recently been introduced to study various phenomena in natural language, ranging from simple paraphrase matching and entity typing to entity tracking and understanding the implications of the context. Given the availability of many such datasets, comprehensive and reliable evaluation is tedious and time-consuming. ORB is an evaluation server that reports performance on diverse reading comprehension datasets, encouraging and facilitating testing a single model's capability in understanding a wide variety of reading phenomena. It also includes a suite of synthetic augmentations that test model's ability to generalize to out-of-distribution syntactic structures.