Discrete Reasoning Over the content of Paragraphs (DROP)

AllenNLP, AI2 Irvine • 2019
A lot of diverse reading comprehension datasets have recently been introduced to study various phenomena in natural language, ranging from simple paraphrase matching and entity typing to entity tracking and understanding the implications of the context. Given the availability of many such datasets, comprehensive and reliable evaluation is tedious and time-consuming. ORB is an evaluation server that reports performance on diverse reading comprehension datasets, encouraging and facilitating testing a single model's capability in understanding a wide variety of reading phenomena. It also includes a suite of synthetic augmentations that test model's ability to generalize to out-of-distribution syntactic structures.
License: CC BY

Leaderboard

Top Public Submissions
DetailsCreatedExact Match
1
QDGAT - ALBERT
AntGroup KG & NLP
9/8/202087%
2
Numeric Transformer - Albert
OneConnect GammaLab NYC
3/17/202086%
3
QDGAT Ensemble
AntGroup KG & NLP
12/15/201985%
4
sna_albert+ Ensemble
OneConnect GammaLab
12/3/201985%
5
QDGAT - RoBERTa
AntGroup KG & NLP
6/1/202085%

Authors

Dheeru Dua, Yizhong Wang, Pradeep Dasigi, Gabriel Stanovsky, Sameer Singh, Matt Gardner