Research - Papers

Explore a selection of our published work on a variety of key research challenges in AI.

FLEX: Unifying Evaluation for Few-Shot NLP

Jonathan BraggArman CohanKyle LoIz Beltagy

2021

NeurIPS

Few-shot NLP research is highly active, yet conducted in disjoint research threads with evaluation suites that lack challenging-yet-realistic testing setups and fail to employ careful experimental…

Towards Personalized Descriptions of Scientific Concepts

Sonia K. MurthyDaniel KingTom HopeDoug Downey

2021

EMNLP 2021 • WiNLP

A single scientific concept can be described in many different ways, and the most informative description depends on the audience. In this paper, we propose generating personalized scientific…

CDLM: Cross-Document Language Modeling

Avi CaciularuArman CohanIz BeltagyIdo Dagan

2021

Findings of EMNLP

We introduce a new pretraining approach for language models that are geared to support multi-document NLP tasks. Our crossdocument language model (CD-LM) improves masked language modeling for these…

MS2: Multi-Document Summarization of Medical Studies

Jay DeYoungIz BeltagyMadeleine van ZuylenLucy Lu Wang

2021

EMNLP

To assess the effectiveness of any medical intervention, researchers must conduct a timeintensive and highly manual literature review. NLP systems can help to automate or assist in parts of this…

SciA11y: Converting Scientific Papers to Accessible HTML

Lucy Lu WangIsabel CacholaJonathan BraggDaniel S. Weld

2021

ASSETS

We present SciA11y, a system that renders inaccessible scientific paper PDFs into HTML. SciA11y uses machine learning models to extract and understand the content of scientific PDFs, and reorganizes…

SciCo: Hierarchical Cross-Document Coreference for Scientific Concepts

Arie CattanSophie JohnsonDaniel S. WeldTom Hope

2021

AKBC

Determining coreference of concept mentions across multiple documents is fundamental for natural language understanding. Work on cross-document coreference resolution (CDCR) typically considers…

Scientific Language Models for Biomedical Knowledge Base Completion: An Empirical Study

Rahul NadkarniDavid WaddenIz BeltagyTom Hope

2021

AKBC

Biomedical knowledge graphs (KGs) hold rich information on entities such as diseases, drugs, and genes. Predicting missing links in these graphs can boost many important applications, such as drug…

S2AND: A Benchmark and Evaluation System for Author Name Disambiguation

Shivashankar SubramanianDaniel KingDoug DowneySergey Feldman

2021

JCDL

Author Name Disambiguation (AND) is the task of resolving which author mentions in a bibliographic database refer to the same real-world person, and is a critical ingredient of digital library…

Explaining Relationships Between Scientific Documents

Kelvin LuuXinyi WuRik Koncel-KedziorskiNoah A. Smit

2021

ACL

We address the task of explaining relationships between two scientific documents using natural language text. This task requires modeling the complex content of long technical documents, deducing a…

PAWLS: PDF Annotation With Labels and Structure

Mark NeumannZejiang ShenSam Skjonsberg

2021

Demo • ACL

Adobe’s Portable Document Format (PDF) is a popular way of distributing view-only documents with a rich visual markup. This presents a challenge to NLP practitioners who wish to use the information…

Previous102-111Next