Skip to main content ->
Ai2

Research - Papers

Explore a selection of our published work on a variety of key research challenges in AI.

Filter papers

Mitigating Biases in CORD-19 for Analyzing COVID-19 Literature

Anshul KanakiaKuansan WangYuxiao DongChieh-Han Wu
2020
Frontiers in Research Metrics and Analytics

On the behest of the Office of Science and Technology Policy in the White House, six institutions, including ours, have created an open research dataset called COVID-19 Research Dataset (CORD-19) to… 

Document-Level Definition Detection in Scholarly Documents: Existing Models, Error Analyses, and Future Directions

Dongyeop KangAndrew HeadRisham SidhuMarti A. Hearst
2020
EMNLP • SDP workshop

The task of definition detection is important for scholarly papers, because papers often make use of technical terminology that may be unfamiliar to readers. Despite prior work on definition… 

PySBD: Pragmatic Sentence Boundary Disambiguation

Nipun SadvilkarM. Neumann
2020
EMNLP • NLP-OSS Workshop

In this paper, we present a rule-based sentence boundary disambiguation Python package that works out-of-the-box for 22 languages. We aim to provide a realistic segmenter which can provide logical… 

Fact or Fiction: Verifying Scientific Claims

David WaddenKyle LoLucy Lu WangHannaneh Hajishirzi
2020
EMNLP

We introduce the task of scientific fact-checking. Given a corpus of scientific articles and a claim about a scientific finding, a fact-checking model must identify abstracts that support or refute… 

MedICaT: A Dataset of Medical Images, Captions, and Textual References

Sanjay SubramanianLucy Lu WangSachin MehtaHannaneh Hajishirzi
2020
Findings of EMNLP

Understanding the relationship between figures and text is key to scientific document understanding. Medical figures in particular are quite complex, often consisting of several subfigures (75% of… 

SciSight: Combining faceted navigation and research group detection for COVID-19 exploratory scientific search

Tom HopeJason PortenoyKishore VasanJevin D. West
2020
EMNLP • Demo

The COVID-19 pandemic has sparked unprecedented mobilization of scientists, already generating thousands of new papers that join a litany of previous biomedical work in related areas. This deluge of… 

SLEDGE-Z: A Zero-Shot Baseline for COVID-19 Literature Search

S. MacAvaneyArman CohanN. Goharian
2020
EMNLP

With worldwide concerns surrounding the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), there is a rapidly growing body of literature on the virus. Clinicians, researchers, and… 

TLDR: Extreme Summarization of Scientific Documents

Isabel CacholaKyle LoArman CohanDaniel S. Weld
2020
Findings of EMNLP

We introduce TLDR generation for scientific papers, a new automatic summarization task with high source compression, requiring expert background knowledge and complex language understanding. To… 

ABNIRML: Analyzing the Behavior of Neural IR Models

Sean MacAvaneySergey FeldmanNazli GoharianArman Cohan
2020
TACL

Numerous studies have demonstrated the effectiveness of pretrained contextualized language models such as BERT and T5 for ad-hoc search. However, it is not wellunderstood why these methods are so… 

Generative Data Augmentation for Commonsense Reasoning

Yiben YangChaitanya MalaviyaJared FernandezDoug Downey
2020
Findings of EMNLP

Recent advances in commonsense reasoning depend on large-scale human-annotated training data to achieve peak performance. However, manual curation of training examples is expensive and has been…