Research - Papers
Explore a selection of our published work on a variety of key research challenges in AI.
Generative Data Augmentation for Commonsense Reasoning
Recent advances in commonsense reasoning depend on large-scale human-annotated training data to achieve peak performance. However, manual curation of training examples is expensive and has been…
Modelling kidney disease using ontology: insights from the Kidney Precision Medicine Project
An important need exists to better understand and stratify kidney disease according to its underlying pathophysiology in order to develop more precise and effective therapeutic agents. National…
High-Precision Extraction of Emerging Concepts from Scientific Literature
Identification of new concepts in scientific literature can help power faceted search, scientific trend analysis, knowledge-base construction, and more, but current methods are lacking. Manual…
CORD-19: The Covid-19 Open Research Dataset
The Covid-19 Open Research Dataset (CORD-19) is a growing 1 resource of scientific papers on Covid-19 and related historical coronavirus research. CORD-19 is designed to facilitate the development…
SUPP. AI: finding evidence for supplement-drug interactions
Dietary supplements are used by a large portion of the population, but information on their pharmacologic interactions is incomplete. To address this challenge, we present this http URL, an…
Language (Re)modelling: Towards Embodied Language Understanding
While natural language understanding (NLU) is advancing rapidly, today’s technology differs from human-like language understanding in fundamental ways, notably in its inferior efficiency,…
S2ORC: The Semantic Scholar Open Research Corpus
We introduce S2ORC, a large contextual citation graph of English-language academic papers from multiple scientific domains; the corpus consists of 81.1M papers, 380.5M citation edges, and associated…
SciREX: A Challenge Dataset for Document-Level Information Extraction
Extracting information from full documents is an important problem in many domains, but most previous work focus on identifying relationships within a sentence or a paragraph. It is challenging to…
SPECTER: Document-level Representation Learning using Citation-informed Transformers
Representation learning is a critical ingredient for natural language processing systems. Recent Transformer language models like BERT learn powerful textual representations, but these models are…
Stolen Probability: A Structural Weakness of Neural Language Models
Neural Network Language Models (NNLMs) generate probability distributions by applying a softmax function to a distance metric formed by taking the dot product of a prediction vector with all word…