Papers

Learn more about AI2's Lasting Impact Award
Viewing 131-140 of 155 papers
  • SpanBERT: Improving Pre-training by Representing and Predicting Spans

    Mandar Joshi, Danqi Chen, Yinhan Liu, Daniel S. Weld, Luke Zettlemoyer, Omer LevyEMNLP2019 We present SpanBERT, a pre-training method that is designed to better represent and predict spans of text. Our approach extends BERT by (1) masking contiguous random spans, rather than random tokens, and (2) training the span boundary representations to…
  • GrapAL: Connecting the Dots in Scientific Literature

    Christine Betts, Joanna Power, Waleed AmmarACL2019 We introduce GrapAL (Graph database of Academic Literature), a versatile tool for exploring and investigating a knowledge base of scientific literature, that was semi-automatically constructed using NLP methods. GrapAL satisfies a variety of use cases and…
  • ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing

    Mark Neumann, Daniel King, Iz Beltagy, Waleed AmmarACL • BioNLP Workshop2019 Despite recent advances in natural language processing, many statistical models for processing text perform extremely poorly under domain shift. Processing biomedical and clinical text is a critically important application area of natural language processing…
  • CEDR: Contextualized Embeddings for Document Ranking

    Sean MacAvaney, Andrew Yates, Arman Cohan, Nazli GoharianSIGIR2019 Although considerable attention has been given to neural ranking architectures recently, far less attention has been paid to the term representations that are used as input to these models. In this work, we investigate how two pretrained contextualized…
  • Ontology-Aware Clinical Abstractive Summarization

    Sean MacAvaney, Sajad Sotudeh, Arman Cohan, Nazli Goharian, Ish Talati, Ross W. FiliceSIGIR2019 Automatically generating accurate summaries from clinical reports could save a clinician's time, improve summary coverage, and reduce errors. We propose a sequence-to-sequence abstractive summarization model augmented with domain-specific ontological…
  • Quantifying Sex Bias in Clinical Studies at Scale With Automated Data Extraction

    Sergey Feldman, Waleed Ammar, Kyle Lo, Elly Trepman, Madeleine van Zuylen, Oren EtzioniJAMA2019 Importance: Analyses of female representation in clinical studies have been limited in scope and scale. Objective: To perform a large-scale analysis of global enrollment sex bias in clinical studies. Design, Setting, and Participants: In this cross…
  • Combining Distant and Direct Supervision for Neural Relation Extraction

    Iz Beltagy, Kyle Lo, Waleed AmmarNAACL2019 In relation extraction with distant supervision, noisy labels make it difficult to train quality models. Previous neural models addressed this problem using an attention mechanism that attends to sentences that are likely to express the relations. We improve…
  • Structural Scaffolds for Citation Intent Classification in Scientific Publications

    Arman Cohan, Waleed Ammar, Madeleine van Zuylen, Field CadyNAACL2019 Identifying the intent of a citation in scientific papers (e.g., background information, use of methods, comparing results) is critical for machine reading of individual publications and automated analysis of the scientific literature. We propose a multitask…
  • Citation Count Analysis for Papers with Preprints

    Sergey Feldman, Kyle Lo, Waleed AmmarArXiv2018 We explore the degree to which papers prepublished on arXiv garner more citations, in an attempt to paint a sharper picture of fairness issues related to prepublishing. A paper’s citation count is estimated using a negative-binomial generalized linear model…
  • Construction of the Literature Graph in Semantic Scholar

    Waleed Ammar, Dirk Groeneveld, Chandra Bhagavatula, Iz Beltagy, Miles Crawford, Doug Downey, Jason Dunkelberger, Ahmed Elgohary, Sergey Feldman, Vu Ha, Rodney Kinney, Sebastian Kohlmeier, Kyle Lo, Tyler Murray, Hsu-Han Ooi, Matthew E. Peters, et al.NAACL-HLT2018 We describe a deployed scalable system for organizing published scientific literature into a heterogeneous graph to facilitate algorithmic manipulation and discovery. The resulting literature graph consists of more than 280M nodes, representing papers…