Research - Papers

Explore a selection of our published work on a variety of key research challenges in AI.

Data Governance in the Age of Large-Scale Data-Driven Language Technology

Yacine JerniteHuu NguyenStella Rose BidermanMargaret Mitchell

2022

FAccT

The recent emergence and adoption of Machine Learning technology, and specifically of Large Language Models, has drawn attention to the need for systematic and transparent management of language…

Multi-Vector Models with Textual Guidance for Fine-Grained Scientific Document Similarity

Sheshera MysoreArman CohanTom Hope

2022

NAACL

We present a new scientific document similarity model based on matching fine-grained aspects of texts. To train our model, we exploit a naturally-occurring source of supervision: sentences in the…

VILA: Improving Structured Content Extraction from Scientific PDFs Using Visual Layout Groups

Zejiang ShenKyle LoLucy Lu WangDoug Downey

2022

TACL

Accurately extracting structured content from PDFs is a critical first step for NLP over scientific papers. Recent work has improved extraction accuracy by incorporating elementary layout…

Improving the Generalizability of Depression Detection by Leveraging Clinical Questionnaires

Thong NguyenAndrew YatesAyah ZiriklyArman Cohan

2022

ACL

Automated methods have been widely used to identify and analyze mental health conditions (e.g., depression) from various sources of information, including social media. Yet, deployment of such…

Zero- and Few-Shot NLP with Pretrained Language Models

Iz BeltagyArman CohanRobert Logan IVSameer Singh

2022

ACL, tutorial

The ability to efficiently learn from little-to-no data is critical to applying NLP to tasks where data collection is costly or otherwise difficult. This is a challenging setting both academically…

Penguins Don't Fly: Reasoning about Generics through Instantiations and Exceptions

Emily AllawayJena D. HwangChandra BhagavatulaYejin Choi

2022

arXiv

Generics express generalizations about the world (e.g., “birds can ﬂy"). However, they are not universally true – while sparrows and penguins are both birds, only sparrows can ﬂy and penguins…

Generating Scientific Claims for Zero-Shot Scientific Fact Checking

Dustin WrightDavid WaddenKyle LoLucy Lu Wang

2022

ACL

Automated scientific fact checking is difficult due to the complexity of scientific language and a lack of significant amounts of training data, as annotation requires domain expertise. To address…

ACCoRD: A Multi-Document Approach to Generating Diverse Descriptions of Scientific Concepts

Sonia K. MurthyKyle LoDaniel KingDoug Downey

2022

arXiv

Systems that can automatically deﬁne unfamiliar terms hold the promise of improving the accessibility of scientiﬁc texts, especially for readers who may lack prerequisite background knowledge.…

PRIMERA: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization

Wen XiaoIz BeltagyG. CareniniArman Cohan

2022

ACL

We introduce PRIMERA, a pre-trained model for multi-document representation with a focus on summarization that reduces the need for dataset-specific architectures and large amounts of fine-tuning…

Scaling Creative Inspiration with Fine-Grained Functional Facets of Product Ideas

Tom HopeRonen TamariHyeonsu KangDafna Shahaf

2022

CHI

Web-scale repositories of products, patents and scientific papers offer an opportunity for building automated systems that scour millions of existing ideas and assist users in discovering novel…

Previous82-91Next