Papers

Learn more about AI2's Lasting Impact Award
Viewing 31-40 of 106 papers
  • Break, Perturb, Build: Automatic Perturbation of Reasoning Paths through Question Decomposition

    Mor Geva, Tomer Wolfson, Jonathan BerantTACL 2021 Recent efforts to create challenge benchmarks that test the abilities of natural language understanding models have largely depended on human annotations. In this work, we introduce the “Break, Perturb, Build” (BPB) framework for automatic reasoning-oriented…
  • Measuring and Improving Consistency in Pretrained Language Models

    Yanai Elazar, Nora Kassner, Shauli Ravfogel, Abhilasha Ravichander, Eduard Hovy, Hinrich Schütze, Yoav GoldbergTACL2021 Consistency of a model — that is, the invariance of its behavior under meaning-preserving alternations in its input — is a highly desirable property in natural language processing. In this paper we study the question: Are Pretrained Language Models (PLMs…
  • Provable Limitations of Acquiring Meaning from Ungrounded Form: What will Future Language Models Understand?

    William Merrill, Yoav Goldberg, Roy Schwartz, Noah A. SmithTACL2021 Language models trained on billions of tokens have recently led to unprecedented results on many NLP tasks. This success raises the question of whether, in principle, a system can ever “understand” raw text without access to some form of grounding. We…
  • Revisiting Few-shot Relation Classification: Evaluation Data and Classification Schemes

    Ofer Sabo, Yanai Elazar, Yoav Goldberg, Ido DaganTACL2021 We explore few-shot learning (FSL) for relation classification (RC). Focusing on the realistic scenario of FSL, in which a test instance might not belong to any of the target categories (none-of-the-above, [NOTA]), we first revisit the recent popular dataset…
  • Memory-efficient Transformers via Top-k Attention

    Ankit Gupta, Guy Dar, Shaya Goodman, David Ciprut, Jonathan BerantarXiv2021 Following the success of dot-product attention in Transformers, numerous approximations have been recently proposed to address its quadratic complexity with respect to the input length. While these variants are memory and compute efficient, it is not possible…
  • SmBoP: Semi-autoregressive Bottom-up Semantic Parsing

    Ohad Rubin and Jonathan BerantNAACL2021 The de-facto standard decoding method for semantic parsing in recent years has been to autoregressively decode the abstract syntax tree of the target program using a top-down depth-first traversal. In this work, we propose an alternative approach: a Semi…
  • MULTIMODALQA: COMPLEX QUESTION ANSWERING OVER TEXT, TABLES AND IMAGES

    Alon Talmor, Ori Yoran, Amnon Catav, Dan Lahav, Yizhong Wang, Akari Asai, Gabriel Ilharco, Hannaneh Hajishirzi, Jonathan BerantICLR2021 When answering complex questions, people can seamlessly combine information from visual, textual and tabular sources. While interest in models that reason over multiple pieces of evidence has surged in recent years, there has been relatively little work on…
  • BERTese: Learning to Speak to BERT

    Adi Haviv, Jonathan Berant, A. GlobersonEACL2021 Large pre-trained language models have been shown to encode large amounts of world and commonsense knowledge in their parameters, leading to substantial interest in methods for extracting that knowledge. In past work, knowledge was extracted by taking…
  • Bootstrapping Relation Extractors using Syntactic Search by Examples

    Matan Eyal, Asaf Amrami, Hillel Taub-Tabib, Yoav GoldbergEACL2021 The advent of neural-networks in NLP brought with it substantial improvements in supervised relation extraction. However, obtaining a sufficient quantity of training data remains a key challenge. In this work we propose a process for bootstrapping training…
  • Evaluating the Evaluation of Diversity in Natural Language Generation

    Guy Tevet, Jonathan BerantEACL2021 Despite growing interest in natural language generation (NLG) models that produce diverse outputs, there is currently no principled method for evaluating the diversity of an NLG system. In this work, we propose a framework for evaluating diversity metrics…