  • Transformers as Soft Reasoners over Language

    Peter Clark, Oyvind Tafjord, Kyle RichardsonIJCAI2020AI has long pursued the goal of having systems reason over explicitly provided knowledge, but building suitable representations has proved challenging. Here we explore whether transformers can similarly learn to reason (or emulate reasoning), but using rules expressed in language, thus bypassing a… more
  • Multi-class Hierarchical Question Classification for Multiple Choice Science Exams

    Dongfang Xu, Peter Jansen, Jaycie Martin, Zhengnan Xie, Vikas Yadav, Harish Tayyar Madabushi, Oyvind Tafjord, Peter ClarkIJCAI2020Prior work has demonstrated that question classification (QC), recognizing the problem domain of a question, can help answer it more accurately. However, developing strong QC algorithms has been hindered by the limited size and complexity of annotated data available. To address this, we present the… more
  • TransOMCS: From Linguistic Graphs to Commonsense Knowledge

    Hongming Zhang, Daniel Khashabi, Yangqiu Song, Dan RothIJCAI2020Commonsense knowledge acquisition is a key problem for artificial intelligence. Conventional methods of acquiring commonsense knowledge generally require laborious and costly human annotations, which are not feasible on a large scale. In this paper, we explore a practical way of mining commonsense… more
  • Not All Claims are Created Equal: Choosing the Right Approach to Assess Your Hypotheses

    Erfan Sadeqi Azer, Daniel Khashabi, Ashish Sabharwal, Dan RothACL2020Empirical research in Natural Language Processing (NLP) has adopted a narrow set of principles for assessing hypotheses, relying mainly on p-value computation, which suffers from several known issues. While alternative proposals have been well-debated and adopted in other fields, they remain rarely… more
  • A Formal Hierarchy of RNN Architectures

    William. Merrill, Gail Garfinkel Weiss, Yoav Goldberg, Roy Schwartz, Noah A. Smith, Eran YahavACL2020We develop a formal hierarchy of the expressive capacity of RNN architectures. The hierarchy is based on two formal properties: space complexity, which measures the RNN's memory, and rational recurrence, defined as whether the recurrent update can be described by a weighted finite-state machine. We… more
  • A Mixture of h-1 Heads is Better than h Heads

    Hao Peng, Roy Schwartz, Dianqi Li, Noah A. SmithACL2020Multi-head attentive neural architectures have achieved state-of-the-art results on a variety of natural language processing tasks. Evidence has shown that they are overparameterized; attention heads can be pruned without significant performance loss. In this work, we instead "reallocate" them… more
  • A Two-Stage Masked LM Method for Term Set Expansion

    Guy Kushilevitz, Shaul Markovitch, Yoav GoldbergACL2020We tackle the task of Term Set Expansion (TSE): given a small seed set of example terms from a semantic class, finding more members of that class. The task is of great practical utility, and also of theoretical utility as it requires generalization from few examples. Previous approaches to the TSE… more
  • Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

    Suchin Gururangan, Ana Marasović, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, Noah A. SmithACL2020
    Best Paper Award Honorable Mention
    Language models pretrained on text from a wide variety of sources form the foundation of today's NLP. In light of the success of these broad-coverage models, we investigate whether it is still helpful to tailor a pretrained model to the domain of a target task. We present a study across four… more
  • Improving Transformer Models by Reordering their Sublayers

    Ofir Press, Noah A. Smith, Omer LevyACL2020Multilayer transformer networks consist of interleaved self-attention and feedforward sublayers. Could ordering the sublayers in a different pattern lead to better performance? We generate randomly ordered transformers and train them with the language modeling objective. We observe that some of… more
  • Injecting Numerical Reasoning Skills into Language Models

    Mor Geva, Ankit Gupta, Jonathan BerantACL2020Large pre-trained language models (LMs) are known to encode substantial amounts of linguistic information. However, high-level reasoning skills, such as numerical reasoning, are difficult to learn from a language-modeling objective only. Consequently, existing models for numerical reasoning have… more