Viewing 27 papers in Semantic Scholar
Clear all
    • EMNLP 2019
      Arman Cohan, Iz Beltagy, Daniel King, Bhavana Dalvi, Daniel S. Weld
      As a step toward better document-level understanding, we explore classification of a sequence of sentences into their corresponding categories, a task that requires understanding sentences in context of the document. Recent successful models for this task have used hierarchical models to…  (More)
    • EMNLP 2019
      Iz Beltagy, Kyle Lo, Arman Cohan
      Obtaining large-scale annotated data for NLP tasks in the scientific domain is challenging and expensive. We release SciBERT, a pretrained language model based on BERT (Devlin et al., 2018) to address the lack of high-quality, large-scale labeled scientific data. SciBERT leverages unsupervised…  (More)
    • EMNLP 2019
      Mandar Joshi, Danqi Chen, Yinhan Liu, Daniel S. Weld, Luke Zettlemoyer, Omer Levy
      We present SpanBERT, a pre-training method that is designed to better represent and predict spans of text. Our approach extends BERT by (1) masking contiguous random spans, rather than random tokens, and (2) training the span boundary representations to predict the entire content of the masked span…  (More)
    • EMNLP 2019
      Mandar Joshi, Omer Levy, Daniel S. Weld, Luke Zettlemoyer
      We apply BERT to coreference resolution, achieving strong improvements on the OntoNotes (+3.9 F1) and GAP (+11.5 F1) benchmarks. A qualitative analysis of model predictions indicates that, compared to ELMo and BERT-base, BERT-large is particularly better at distinguishing between related but…  (More)
    • ACL 2019
      Christine Betts, Joanna Power, Waleed Ammar
      We introduce GrapAL (Graph database of Academic Literature), a versatile tool for exploring and investigating a knowledge base of scientific literature, that was semi-automatically constructed using NLP methods. GrapAL satisfies a variety of use cases and information needs requested by researchers…  (More)
    • ACL • BioNLP Workshop 2019
      Mark Neumann, Daniel King, Iz Beltagy, Waleed Ammar
      Despite recent advances in natural language processing, many statistical models for processing text perform extremely poorly under domain shift. Processing biomedical and clinical text is a critically important application area of natural language processing, for which there are few robust…  (More)
    • JAMA 2019
      Sergey Feldman, Waleed Ammar, Kyle Lo, Elly Trepman, Madeleine van Zuylen, Oren Etzioni
      Importance: Analyses of female representation in clinical studies have been limited in scope and scale. Objective: To perform a large-scale analysis of global enrollment sex bias in clinical studies. Design, Setting, and Participants: In this cross-sectional study, clinical studies from published…  (More)
    • arXiv 2019
      Lucy Lu Wang, Gabriel Stanovsky, Luca Weihs, Oren Etzioni
      A comprehensive and up-to-date analysis of Computer Science literature (2.87 million papers through 2018) reveals that, if current trends continue, parity between the number of male and female authors will not be reached in this century. Under our most optimistic projection models, gender parity is…  (More)
    • NAACL 2019
      Arman Cohan, Waleed Ammar, Madeleine van Zuylen, Field Cady
      Identifying the intent of a citation in scientific papers (e.g., background information, use of methods, comparing results) is critical for machine reading of individual publications and automated analysis of the scientific literature. We propose a multitask approach to incorporate information in…  (More)
    • NAACL 2019
      Iz Beltagy, Kyle Lo, Waleed Ammar
      In relation extraction with distant supervision, noisy labels make it difficult to train quality models. Previous neural models addressed this problem using an attention mechanism that attends to sentences that are likely to express the relations. We improve such models by combining the distant…  (More)
    • ArXiv 2018
      Sergey Feldman, Kyle Lo, Waleed Ammar
      We explore the degree to which papers prepublished on arXiv garner more citations, in an attempt to paint a sharper picture of fairness issues related to prepublishing. A paper’s citation count is estimated using a negative-binomial generalized linear model (GLM) while observing a binary variable…  (More)
    • NAACL-HLT 2018
      Waleed Ammar, Dirk Groeneveld, Chandra Bhagavatula, Iz Beltagy, Miles Crawford, Doug Downey, Jason Dunkelberger, Ahmed Elgohary, Sergey Feldman, Vu Ha, Rodney Kinney, Sebastian Kohlmeier, Kyle Lo, Tyler Murray, Hsu-Han Ooi, Matthew E. Peters, et al.
      We describe a deployed scalable system for organizing published scientific literature into a heterogeneous graph to facilitate algorithmic manipulation and discovery. The resulting literature graph consists of more than 280M nodes, representing papers, authors, entities and various interactions…  (More)
    • NAACL-HLT 2018
      Chandra Bhagavatula, Sergey Feldman, Russell Power, Waleed Ammar
      We present a content-based method for recommending citations in an academic paper draft. We embed a given query document into a vector space, then use its nearest neighbors as candidates, and rerank the candidates using a discriminative model trained to distinguish between observed and unobserved…  (More)
    • NAACL-HLT 2018 Dataset
      Dongyeop Kang, Waleed Ammar, Bhavana Dalvi Mishra, Madeleine van Zuylen, Sebastian Kohlmeier, Eduard Hovy, Roy Schwartz
      Peer reviewing is a central component in the scientific publishing process. We present the first public dataset of scientific peer reviews available for research pur- poses (PeerRead v1), providing an opportunity to study this important artifact. The dataset consists of 14.7K paper drafts and the…  (More)
    • JCDL 2018
      Noah Siegel, Nicholas Lourie, Russell Power and Waleed Ammar
      Non-textual components such as charts, diagrams and tables provide key information in many scientific documents, but the lack of large labeled datasets has impeded the development of data-driven methods for scientific figure extraction. In this paper, we induce high-quality training labels for the…  (More)
    • ACL • Proceedings of the BioNLP 2018 Workshop 2018
      Lucy L. Wang, Chandra Bhagavatula, M. Neumann, Kyle Lo, Chris Wilhelm, Waleed Ammar
      Ontology alignment is the task of identifying semantically equivalent entities from two given ontologies. Different ontologies have different representations of the same entity, resulting in a need to de-duplicate entities when merging ontologies. We propose a method for enriching entities in an…  (More)
    • ACL 2017
      Matthew E. Peters, Waleed Ammar, Chandra Bhagavatula, and Russell Power
      Pre-trained word embeddings learned from unlabeled text have become a standard component of neural network architectures for NLP tasks. However, in most cases, the recurrent network that operates on word-level representations to produce context sensitive representations is trained on relatively…  (More)
    • WWW 2017
      Chenyan Xiong, Russell Power and Jamie Callan
      This paper introduces Explicit Semantic Ranking (ESR), a new ranking technique that leverages knowledge graph embedding. Analysis of the query log from our academic search engine, SemanticScholar.org, reveals that a major error source is its inability to understand the meaning of research concepts…  (More)
    • SemEval 2017
      Waleed Ammar, Matthew E. Peters, Chandra Bhagavatula, and Russell Power
      This paper describes our submission for the ScienceIE shared task (SemEval-2017 Task 10) on entity and relation extraction from scientific papers. Our model is based on the end-to-end relation extraction model of Miwa and Bansal (2016) with several enhancements such as semi-supervised learning via…  (More)
    • JCDL 2017
      Luca Weihs and Oren Etzioni
      Citations implicitly encode a community's judgment of a paper's importance and thus provide a unique signal by which to study scientific impact. Efforts in understanding and refining this signal are reflected in the probabilistic modeling of citation networks and the proliferation of citation-based…  (More)
    • SIGIR 2017
      Chenyan Xiong, Zhuyun Dai, Jamie Callan, Zhiyuan Liu, and Russell Power
      This paper proposes K-NRM, a kernel based neural model for document ranking. Given a query and a set of documents, K-NRM uses a translation matrix that models word-level similarities via word embeddings, a new kernel-pooling technique that uses kernels to extract multi-level soft match features…  (More)
    • Nature 2017
      Oren Etzioni
      The number of times a paper is cited is a poor proxy for its impact (see P. Stephan et al. Nature 544, 411–412; 2017). I suggest relying instead on a new metric that uses artificial intelligence (AI) to capture the subset of an author's or a paper's essential and therefore most highly influential…  (More)
    • ACL 2017
      Pradeep Dasigi, Waleed Ammar, Chris Dyer, and Eduard Hovy
      Type-level word embeddings use the same set of parameters to represent all instances of a word regardless of its context, ignoring the inherent lexical ambiguity in language. Instead, we embed semantic concepts (or synsets) as defined in WordNet and represent a word token in a particular context by…  (More)
    • CSCW 2016
      Shih-Wen Huang, Jonathan Bragg, Isaac Cowhey, Oren Etzioni, and Daniel S. Weld
      Successful online communities (e.g., Wikipedia, Yelp, and StackOverflow) can produce valuable content. However, many communities fail in their initial stages. Starting an online community is challenging because there is not enough content to attract a critical mass of active members. This paper…  (More)
    • JCDL 2016
      Christopher Clark and Santosh Divvala
      Figures and tables are key sources of information in many scholarly documents. However, current academic search engines do not make use of figures and tables when semantically parsing documents or presenting document summaries to users. To facilitate these applications we develop an algorithm that…  (More)
    • AAAI • Workshop on Scholarly Big Data 2015
      Christopher Clark and Santosh Divvala
      Identifying and extracting figures and tables along with their captions from scholarly articles is important both as a way of providing tools for article summarization, and as part of larger systems that seek to gain deeper, semantic understanding of these articles. While many "off-the-shelf" tools…  (More)
    • AAAI • Workshop on Scholarly Big Data 2015
      Marco Valenzuela, Vu Ha, and Oren Etzioni
      We introduce the novel task of identifying important citations in scholarly literature, i.e., citations that indicate that the cited work is used or extended in the new effort. We believe this task is a crucial component in algorithms that detect and follow research topics and in methods that…  (More)