Skip to main content ->
Ai2

Research - Papers

Explore a selection of our published work on a variety of key research challenges in AI.

Filter papers

PySBD: Pragmatic Sentence Boundary Disambiguation

Nipun SadvilkarM. Neumann
2020
EMNLP • NLP-OSS Workshop

In this paper, we present a rule-based sentence boundary disambiguation Python package that works out-of-the-box for 22 languages. We aim to provide a realistic segmenter which can provide logical… 

Fact or Fiction: Verifying Scientific Claims

David WaddenKyle LoLucy Lu WangHannaneh Hajishirzi
2020
EMNLP

We introduce the task of scientific fact-checking. Given a corpus of scientific articles and a claim about a scientific finding, a fact-checking model must identify abstracts that support or refute… 

MedICaT: A Dataset of Medical Images, Captions, and Textual References

Sanjay SubramanianLucy Lu WangSachin MehtaHannaneh Hajishirzi
2020
Findings of EMNLP

Understanding the relationship between figures and text is key to scientific document understanding. Medical figures in particular are quite complex, often consisting of several subfigures (75% of… 

SciSight: Combining faceted navigation and research group detection for COVID-19 exploratory scientific search

Tom HopeJason PortenoyKishore VasanJevin D. West
2020
EMNLP • Demo

The COVID-19 pandemic has sparked unprecedented mobilization of scientists, already generating thousands of new papers that join a litany of previous biomedical work in related areas. This deluge of… 

SLEDGE-Z: A Zero-Shot Baseline for COVID-19 Literature Search

S. MacAvaneyArman CohanN. Goharian
2020
EMNLP

With worldwide concerns surrounding the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), there is a rapidly growing body of literature on the virus. Clinicians, researchers, and… 

TLDR: Extreme Summarization of Scientific Documents

Isabel CacholaKyle LoArman CohanDaniel S. Weld
2020
Findings of EMNLP

We introduce TLDR generation for scientific papers, a new automatic summarization task with high source compression, requiring expert background knowledge and complex language understanding. To… 

ABNIRML: Analyzing the Behavior of Neural IR Models

Sean MacAvaneySergey FeldmanNazli GoharianArman Cohan
2020
TACL

Numerous studies have demonstrated the effectiveness of pretrained contextualized language models such as BERT and T5 for ad-hoc search. However, it is not wellunderstood why these methods are so… 

Generative Data Augmentation for Commonsense Reasoning

Yiben YangChaitanya MalaviyaJared FernandezDoug Downey
2020
Findings of EMNLP

Recent advances in commonsense reasoning depend on large-scale human-annotated training data to achieve peak performance. However, manual curation of training examples is expensive and has been… 

Modelling kidney disease using ontology: insights from the Kidney Precision Medicine Project

E. OngL. Lu WangJ. Schaubet al
2020
Nature Reviews Nephrology

An important need exists to better understand and stratify kidney disease according to its underlying pathophysiology in order to develop more precise and effective therapeutic agents. National… 

High-Precision Extraction of Emerging Concepts from Scientific Literature

Daniel KingDoug DowneyDaniel S. Weld
2020
SIGIR

Identification of new concepts in scientific literature can help power faceted search, scientific trend analysis, knowledge-base construction, and more, but current methods are lacking. Manual…