Research - Papers
Explore a selection of our published work on a variety of key research challenges in AI.
Searching for Scientific Evidence in a Pandemic: An Overview of TREC-COVID
We present an overview of the TREC-COVID Challenge, an information retrieval (IR) shared task to evaluate search on scientific literature related to COVID-19. The goals of TREC-COVID include the…
Improving the Accessibility of Scientific Documents: Current State, User Needs, and a System Solution to Enhance Scientific PDF Accessibility for Blind and Low Vision Users
The majority of scientific papers are distributed in PDF, which pose challenges for accessibility, especially for blind and low vision (BLV) readers. We characterize the scope of this problem by…
LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis
Recent advances in document image analysis (DIA) have been primarily driven by the application of neural networks. Ideally, research outcomes could be easily deployed in production and extended for…
Gender trends in computer science authorship
A comprehensive and up-to-date analysis of Computer Science literature (2.87 million papers through 2018) reveals that, if current trends continue, parity between the number of male and female…
On Generating Extended Summaries of Long Documents
Prior work in document summarization has mainly focused on generating short summaries of a document. While this type of summary helps get a high-level view of a given document, it is desirable in…
Optimizing AI for Teamwork
In many high-stakes domains such as criminal justice, finance, and healthcare, AI systems may recommend actions to a human expert responsible for final decisions, a context known as AI-advised…
GENIE: A Leaderboard for Human-in-the-Loop Evaluation of Text Generation
Leaderboards have eased model development for many NLP datasets by standardizing their evaluation and delegating it to an independent external repository. Their adoption, however, is so far limited…
Text mining approaches for dealing with the rapidly expanding literature on COVID-19
More than 50 000 papers have been published about COVID-19 since the beginning of 2020 and several hundred new papers continue to be published every day. This incredible rate of scientific…
Mitigating Biases in CORD-19 for Analyzing COVID-19 Literature
On the behest of the Office of Science and Technology Policy in the White House, six institutions, including ours, have created an open research dataset called COVID-19 Research Dataset (CORD-19) to…
Document-Level Definition Detection in Scholarly Documents: Existing Models, Error Analyses, and Future Directions
The task of definition detection is important for scholarly papers, because papers often make use of technical terminology that may be unfamiliar to readers. Despite prior work on definition…