Papers
See AI2's Award Winning Papers
Learn more about AI2's Lasting Impact Award
Viewing 21-30 of 102 papers
COVR: A test-bed for Visually Grounded Compositional Generalization with real images
Ben Bogin, Shivanshu Gupta, Matt Gardner, Jonathan BerantEMNLP • 2021 While interest in models that generalize at test time to new compositions has risen in recent years, benchmarks in the visually-grounded domain have thus far been restricted to synthetic images. In this work, we propose COVR, a new test-bed for visually…Question Decomposition with Dependency Graphs
Matan Hasson, Jonathan BerantAKBC • 2021 QDMR is a meaning representation for complex questions, which decomposes questions into a sequence of atomic steps. While stateof-the-art QDMR parsers use the common sequence-to-sequence (seq2seq) approach, a QDMR structure fundamentally describes labeled…Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies
Mor Geva, Daniel Khashabi, Elad Segal, Tushar Khot, Dan Roth, Jonathan BerantTACL • 2021 A key limitation in current datasets for multi-hop reasoning is that the required steps for answering the question are mentioned in it explicitly. In this work, we introduce STRATEGYQA, a question answering (QA) benchmark where the required reasoning steps…Few-Shot Question Answering by Pretraining Span Selection
Ori Ram, Yuval Kirstain, Jonathan Berant, A. Globerson, Omer LevyACL • 2021 In a number of question answering (QA) benchmarks, pretrained models have reached human parity through fine-tuning on an order of 100,000 annotated questions and answers. We explore the more realistic few-shot setting, where only a few hundred training…Neural Extractive Search
Shaul Ravfogel, Hillel Taub-Tabib, Yoav GoldbergACL • Demo Track • 2021 Domain experts often need to extract structured information from large corpora. We advocate for a search paradigm called “extractive search”, in which a search query is enriched with capture-slots, to allow for such rapid extraction. Such an extractive search…Turning Tables: Generating Examples from Semi-structured Tables for Endowing Language Models with Reasoning Skills
Ori Yoran, Alon Talmor, Jonathan BerantarXiv • 2021 Models pre-trained with a language modeling objective possess ample world knowledge and language skills, but are known to struggle in tasks that require reasoning. In this work, we propose to leverage semi-structured tables, and automatically generate at…Break, Perturb, Build: Automatic Perturbation of Reasoning Paths through Question Decomposition
Mor Geva, Tomer Wolfson, Jonathan BerantTACL • 2021 Recent efforts to create challenge benchmarks that test the abilities of natural language understanding models have largely depended on human annotations. In this work, we introduce the “Break, Perturb, Build” (BPB) framework for automatic reasoning-oriented…Measuring and Improving Consistency in Pretrained Language Models
Yanai Elazar, Nora Kassner, Shauli Ravfogel, Abhilasha Ravichander, Eduard Hovy, Hinrich Schütze, Yoav GoldbergTACL • 2021 Consistency of a model — that is, the invariance of its behavior under meaning-preserving alternations in its input — is a highly desirable property in natural language processing. In this paper we study the question: Are Pretrained Language Models (PLMs…Provable Limitations of Acquiring Meaning from Ungrounded Form: What will Future Language Models Understand?
William Merrill, Yoav Goldberg, Roy Schwartz, Noah A. SmithTACL • 2021 Language models trained on billions of tokens have recently led to unprecedented results on many NLP tasks. This success raises the question of whether, in principle, a system can ever “understand” raw text without access to some form of grounding. We…Revisiting Few-shot Relation Classification: Evaluation Data and Classification Schemes
Ofer Sabo, Yanai Elazar, Yoav Goldberg, Ido DaganTACL • 2021 We explore few-shot learning (FSL) for relation classification (RC). Focusing on the realistic scenario of FSL, in which a test instance might not belong to any of the target categories (none-of-the-above, [NOTA]), we first revisit the recent popular dataset…