Research - Papers
Explore a selection of our published work on a variety of key research challenges in AI.
Paper Plain: Making Medical Research Papers Approachable to Healthcare Consumers with Natural Language Processing
When seeking information not covered in patient-friendly documents, healthcare consumers may turn to the research literature. Reading medical papers, however, can be a challenging experience. To…
One-Shot Labeling for Automatic Relevance Estimation
Dealing with unjudged documents ("holes") in relevance assessments is a perennial problem when evaluating search systems with offline experiments. Holes can reduce the apparent effectiveness of…
A Search Engine for Discovery of Scientific Challenges and Directions
Keeping track of scientific challenges, advances and emerging directions is a fundamental part of research. However, researchers face a flood of papers that hinders discovery of important knowledge.…
FLEX: Unifying Evaluation for Few-Shot NLP
Few-shot NLP research is highly active, yet conducted in disjoint research threads with evaluation suites that lack challenging-yet-realistic testing setups and fail to employ careful experimental…
Towards Personalized Descriptions of Scientific Concepts
A single scientific concept can be described in many different ways, and the most informative description depends on the audience. In this paper, we propose generating personalized scientific…
CDLM: Cross-Document Language Modeling
We introduce a new pretraining approach for language models that are geared to support multi-document NLP tasks. Our crossdocument language model (CD-LM) improves masked language modeling for these…
MS2: Multi-Document Summarization of Medical Studies
To assess the effectiveness of any medical intervention, researchers must conduct a timeintensive and highly manual literature review. NLP systems can help to automate or assist in parts of this…
SciA11y: Converting Scientific Papers to Accessible HTML
We present SciA11y, a system that renders inaccessible scientific paper PDFs into HTML. SciA11y uses machine learning models to extract and understand the content of scientific PDFs, and reorganizes…
SciCo: Hierarchical Cross-Document Coreference for Scientific Concepts
Determining coreference of concept mentions across multiple documents is fundamental for natural language understanding. Work on cross-document coreference resolution (CDCR) typically considers…
Scientific Language Models for Biomedical Knowledge Base Completion: An Empirical Study
Biomedical knowledge graphs (KGs) hold rich information on entities such as diseases, drugs, and genes. Predicting missing links in these graphs can boost many important applications, such as drug…