Rowan Zellers, Ari
Holtzman, Yonatan Bisk,
Ali Farhadi, Yejin
Choi
HellaSwag is a commonsense
inference challenge dataset that
is trivial for humans but
difficult even for the most
advanced language models. This
approach also suggests a new
path forward for NLP research,
in which benchmarks co-evolve
with the evolving
state-of-the-art in an
adversarial way, so as to
present ever-harder challenges.
|
|