Papers

Viewing 1-10 of 41 papers
  • Language (Re)modelling: Towards Embodied Language Understanding

    Ronen Tamari, Chen Shani, Tom Hope, Miriam R. L. Petruck, Omri Abend, Dafna Shahaf ACL2020While natural language understanding (NLU) is advancing rapidly, today’s technology differs from human-like language understanding in fundamental ways, notably in its inferior efficiency, interpretability, and generalization. This work proposes an approach to representation and learning based on… more
  • S2ORC: The Semantic Scholar Open Research Corpus

    Kyle Lo, Lucy Lu Wang, Mark E Neumann, Rodney Michael Kinney, Daniel S. Weld ACL2020We introduce S2ORC, a large contextual citation graph of English-language academic papers from multiple scientific domains; the corpus consists of 81.1M papers, 380.5M citation edges, and associated paper metadata. We provide structured full text for 8.1M open access papers. All inline citation… more
  • SciREX: A Challenge Dataset for Document-Level Information Extraction

    Sarthak Jain, Madeleine van Zuylen, Hannaneh Hajishirzi, Iz BeltagyACL2020Extracting information from full documents is an important problem in many domains, but most previous work focus on identifying relationships within a sentence or a paragraph. It is challenging to create a large-scale information extraction (IE) dataset at the document level since it requires an… more
  • SPECTER: Document-level Representation Learning using Citation-informed Transformers

    Arman Cohan, Sergey Feldman, Iz Beltagy, Doug Downey, Daniel S. WeldACL2020Representation learning is a critical ingredient for natural language processing systems. Recent Transformer language models like BERT learn powerful textual representations, but these models are targeted towards tokenand sentence-level training objectives and do not leverage information on inter… more
  • Stolen Probability: A Structural Weakness of Neural Language Models

    David Demeter, Gregory Kimmel, Doug DowneyACL2020Neural Network Language Models (NNLMs) generate probability distributions by applying a softmax function to a distance metric formed by taking the dot product of a prediction vector with all word vectors in a high-dimensional embedding space. The dot-product distance metric forms part of the… more
  • SciSight: Combining faceted navigation and research group detection for COVID-19 exploratory scientific search

    Tom Hope, Jason Portenoy, Kishore Vasan, Jonathan Borchardt, Eric Horvitz, Daniel S. Weld, Marti A. Hearst, Jevin D. WestbioRxiv2020The COVID-19 pandemic has sparked unprecedented mobilization of scientists, already generating thousands of new papers that join a litany of previous biomedical work in related areas. This deluge of information makes it hard for researchers to keep track of their own field, let alone explore new… more
  • TREC-COVID: Constructing a Pandemic Information Retrieval Test Collection

    Ellen M. Voorhees, Tasmeer Alam, Steven Bedrick, Dina Demner-Fushman, William R. Hersh, Kyle Lo, Kirk Roberts, Ian Soboroff, Lucy Lu Wang arXiv2020TREC-COVID is a community evaluation designed to build a test collection that captures the information needs of biomedical researchers using the scientific literature during a pandemic. One of the key characteristics of pandemic search is the accelerated rate of change: the topics of interest… more
  • SLEDGE: A Simple Yet Effective Baseline for Coronavirus Scientific Knowledge Search

    Sean MacAvaney, Arman Cohan, Nazli GoharianarXiv2020With worldwide concerns surrounding the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), there is a rapidly growing body of literature on the virus. Clinicians, researchers, and policy-makers need a way to effectively search these articles. In this work, we present a search system… more
  • TREC-COVID: Rationale and Structure of an Information Retrieval Shared Task for COVID-19

    Kirk Roberts, Tasmeer Alam, Steven Bedrick, Dina Demner-Fushman, Kyle Lo, Ian Soboroff, Ellen M. Voorhees, Lucy Lu Wang, William R. Hersh JAMIA2020TREC-COVID is an information retrieval (IR) shared task initiated to support clinicians and clinical research during the COVID-19 pandemic. IR for pandemics breaks many normal assumptions, which can be seen by examining nine important basic IR research questions related to pandemic situations. TREC… more
  • Fact or Fiction: Verifying Scientific Claims

    David Wadden, Kyle Lo, Lucy Lu Wang, Shanchuan Lin, Madeleine van Zuylen, Arman Cohan, Hannaneh HajishirziarXiv2020We introduce the task of scientific factchecking. Given a corpus of scientific articles and a claim about a scientific finding, a factchecking model must identify abstracts that support or refute the claim. In addition, it must provide rationales for its predictions in the form of evidentiary… more