Papers
See AI2's Award Winning Papers
Learn more about AI2's Lasting Impact Award
Viewing 1-10 of 100 papers
Linear Adversarial Concept Erasure
Shauli Ravfogel, Michael Twiton, Yoav Goldberg, Ryan CotterellICML • 2022 We formulate the problem of identifying and erasing a linear subspace that corresponds to a given concept, in order to prevent linear predictors from recovering the concept. We model this problem as a constrained, linear minimax game, and show that existing…A Dataset for N-ary Relation Extraction of Drug Combinations
Aryeh Tiktinsky, Vijay Viswanathan, Danna Niezni, Dana Azagury, Yosi Shamay, Hillel Taub-Tabib, Tom Hope, Yoav GoldbergNAACL • 2022 Combination therapies have become the standard of care for diseases such as cancer, tuberculosis, malaria and HIV. However, the combinatorial set of available multi-drug treatments creates a challenge in identifying effective combination therapies available…Weakly Supervised Text-to-SQL Parsing through Question Decomposition
Tomer Wolfson, Daniel Deutch, Jonathan BerantFindings of NAACL • 2022 Text-to-SQL parsers are crucial in enabling non-experts to effortlessly query relational data. Training such parsers, by contrast, generally requires expertise in annotating natural language (NL) utterances with corresponding SQL queries. In this work, we…Large Scale Substitution-based Word Sense Induction
Authors: Matan Eyal, Shoval Sadde, Hillel Taub-Tabib, Yoav GoldbergACL • 2022 We present a word-sense induction method based on pre-trained masked language models (MLMs), which can cheaply scale to large vocabularies and large corpora. The result is a corpus which is sense-tagged according to a corpus-derived sense inventory and where…Inferring Implicit Relations with Language Models
Uri Katz, Mor Geva, Jonathan BerantNAACL • UnImplicit 2022 • 2022 A prominent challenge for modern language understanding systems is the ability to answer implicit reasoning questions, where the required reasoning steps for answering the question are not mentioned in the text explicitly. In this work, we investigate why…LM-Debugger: An Interactive Tool for Inspection and Intervention in Transformer-Based Language Models
Mor Geva, Avi Caciularu, Guy Dar, Paul Roit, Shoval Sadde, Micah Shlain, Bar Tamir, Yoav GoldbergarXiv • 2022 The opaque nature and unexplained behavior of transformer-based language models (LMs) have spurred a wide interest in interpreting their predictions. However, current interpretation methods mostly focus on probing models from outside, executing behavioral…Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space
Mor Geva, Avi Caciularu, Kevin Ro Wang, Yoav GoldbergarXiv • 2022 Transformer-based language models (LMs) are at the core of modern NLP, but their inter-nal prediction construction process is opaque and largely not understood. In this work, we make a substantial step towards unveiling this underlying prediction process, by…Text-based NP Enrichment
Yanai Elazar, Victoria Basmov, Yoav Goldberg, Reut TsarfatyTACL • 2022 Understanding the relations between entities denoted by NPs in text is a critical part of human-like natural language understanding. However, only a fraction of such relations is covered by NLP tasks and models nowadays. In this work, we establish the task of…SCROLLS: Standardized CompaRison Over Long Language Sequences
Uri Shaham, Elad Segal, Maor Ivgi, Avia Efrat, Ori Yoran, Adi Haviv, Ankit Gupta, Wenhan Xiong, Mor Geva, Jonathan Berant, Omer LevyarXiv • 2022 NLP benchmarks have largely focused on short texts, such as sentences and paragraphs, even though long texts comprise a considerable amount of natural language in the wild. We introduce SCROLLS, a suite of tasks that require reasoning over long texts. We…CommonsenseQA 2.0: Exposing the Limits of AI through Gamification
Alon Talmor, Ori Yoran, Ronan Le Bras, Chandrasekhar Bhagavatula, Yoav Goldberg, Yejin Choi, Jonathan Berant NeurIPS • 2021 Constructing benchmarks that test the abilities of modern natural language un1 derstanding models is difficult – pre-trained language models exploit artifacts in 2 benchmarks to achieve human parity, but still fail on adversarial examples and make 3 errors…