Papers

Learn more about AI2's Lasting Impact Award
Viewing 1-10 of 100 papers
  • Linear Adversarial Concept Erasure

    Shauli Ravfogel, Michael Twiton, Yoav Goldberg, Ryan CotterellICML2022 We formulate the problem of identifying and erasing a linear subspace that corresponds to a given concept, in order to prevent linear predictors from recovering the concept. We model this problem as a constrained, linear minimax game, and show that existing…
  • A Dataset for N-ary Relation Extraction of Drug Combinations

    Aryeh Tiktinsky, Vijay Viswanathan, Danna Niezni, Dana Azagury, Yosi Shamay, Hillel Taub-Tabib, Tom Hope, Yoav GoldbergNAACL2022 Combination therapies have become the standard of care for diseases such as cancer, tuberculosis, malaria and HIV. However, the combinatorial set of available multi-drug treatments creates a challenge in identifying effective combination therapies available…
  • Weakly Supervised Text-to-SQL Parsing through Question Decomposition

    Tomer Wolfson, Daniel Deutch, Jonathan BerantFindings of NAACL2022 Text-to-SQL parsers are crucial in enabling non-experts to effortlessly query relational data. Training such parsers, by contrast, generally requires expertise in annotating natural language (NL) utterances with corresponding SQL queries. In this work, we…
  • Large Scale Substitution-based Word Sense Induction

    Authors: Matan Eyal, Shoval Sadde, Hillel Taub-Tabib, Yoav GoldbergACL2022 We present a word-sense induction method based on pre-trained masked language models (MLMs), which can cheaply scale to large vocabularies and large corpora. The result is a corpus which is sense-tagged according to a corpus-derived sense inventory and where…
  • Inferring Implicit Relations with Language Models

    Uri Katz, Mor Geva, Jonathan BerantNAACL • UnImplicit 20222022 A prominent challenge for modern language understanding systems is the ability to answer implicit reasoning questions, where the required reasoning steps for answering the question are not mentioned in the text explicitly. In this work, we investigate why…
  • LM-Debugger: An Interactive Tool for Inspection and Intervention in Transformer-Based Language Models

    Mor Geva, Avi Caciularu, Guy Dar, Paul Roit, Shoval Sadde, Micah Shlain, Bar Tamir, Yoav GoldbergarXiv2022 The opaque nature and unexplained behavior of transformer-based language models (LMs) have spurred a wide interest in interpreting their predictions. However, current interpretation methods mostly focus on probing models from outside, executing behavioral…
  • Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space

    Mor Geva, Avi Caciularu, Kevin Ro Wang, Yoav GoldbergarXiv2022 Transformer-based language models (LMs) are at the core of modern NLP, but their inter-nal prediction construction process is opaque and largely not understood. In this work, we make a substantial step towards unveiling this underlying prediction process, by…
  • Text-based NP Enrichment

    Yanai Elazar, Victoria Basmov, Yoav Goldberg, Reut TsarfatyTACL2022 Understanding the relations between entities denoted by NPs in text is a critical part of human-like natural language understanding. However, only a fraction of such relations is covered by NLP tasks and models nowadays. In this work, we establish the task of…
  • SCROLLS: Standardized CompaRison Over Long Language Sequences

    Uri Shaham, Elad Segal, Maor Ivgi, Avia Efrat, Ori Yoran, Adi Haviv, Ankit Gupta, Wenhan Xiong, Mor Geva, Jonathan Berant, Omer LevyarXiv2022 NLP benchmarks have largely focused on short texts, such as sentences and paragraphs, even though long texts comprise a considerable amount of natural language in the wild. We introduce SCROLLS, a suite of tasks that require reasoning over long texts. We…
  • CommonsenseQA 2.0: Exposing the Limits of AI through Gamification

    Alon Talmor, Ori Yoran, Ronan Le Bras, Chandrasekhar Bhagavatula, Yoav Goldberg, Yejin Choi, Jonathan Berant NeurIPS2021 Constructing benchmarks that test the abilities of modern natural language un1 derstanding models is difficult – pre-trained language models exploit artifacts in 2 benchmarks to achieve human parity, but still fail on adversarial examples and make 3 errors…