An abstract illustration of swirling shapes, meant to denote a futuristic feeling.

Research - Papers

Explore a selection of our published work on a variety of key research challenges in AI.

Parsing with Multilingual BERT, a Small Treebank, and a Small Corpus

Ethan C. ChauLucy H. LinNoah A. Smith

2020

Findings of EMNLP

Pretrained multilingual contextual representations have shown great success, but due to the limits of their pretraining data, their benefits do not apply equally to all language varieties. This…

Plug and Play Autoencoders for Conditional Text Generation

Florian MaiNikolaos PappasI. MonteroNoah A. Smith

2020

EMNLP

Text autoencoders are commonly used for conditional generation tasks such as style transfer. We propose methods which are plug and play, where any pretrained autoencoder can be used, and only…

RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models

Samuel GehmanSuchin GururanganMaarten SapNoah A. Smith

2020

Findings of EMNLP

Pretrained neural language models (LMs) are prone to generating racist, sexist, or otherwise toxic language which hinders their safe deployment. We investigate the extent to which pretrained LMs can…

The Multilingual Amazon Reviews Corpus

Phillip KeungY. LuGyorgy SzarvasNoah A. Smith

2020

EMNLP

We present the Multilingual Amazon Reviews Corpus (MARC), a large-scale collection of Amazon reviews for multilingual text classification. The corpus contains reviews in English, Japanese, German,…

TORQUE: A Reading Comprehension Dataset of Temporal Ordering Questions

Qiang NingHao WuRujun HanDan Roth

2020

EMNLP

A critical part of reading is being able to understand the temporal relationships between events described in a passage of text, even when those relationships are not explicitly stated. However,…

Writing Strategies for Science Communication: Data and Computational Analysis

Tal AugustLauren KimKatharina ReineckeNoah A. Smith

2020

EMNLP

Communicating complex scientific ideas without misleading or overwhelming the public is challenging. While science communication guides exist, they rarely offer empirical evidence for how their…

Evaluating Models' Local Decision Boundaries via Contrast Sets

M. GardnerY. ArtziV. Basmovaet al

2020

Findings of EMNLP

Standard test sets for supervised learning evaluate in-distribution generalization. Unfortunately, when a dataset has systematic gaps (e.g., annotation artifacts), these evaluations are misleading:…

Break It Down: A Question Understanding Benchmark

Tomer WolfsonMor GevaAnkit GuptaJonathan Berant

2020

TACL

Understanding natural language questions entails the ability to break down a question into the requisite steps for computing its answer. In this work, we introduce a Question Decomposition Meaning…

CORD-19: The Covid-19 Open Research Dataset

L. Lu WangK. LoY. ChandrasekharS. Kohlmeier

2020

ACL • NLP-COVID

The Covid-19 Open Research Dataset (CORD-19) is a growing 1 resource of scientific papers on Covid-19 and related historical coronavirus research. CORD-19 is designed to facilitate the development…

A Formal Hierarchy of RNN Architectures

William. MerrillGail Garfinkel WeissYoav GoldbergEran Yahav

2020

ACL

We develop a formal hierarchy of the expressive capacity of RNN architectures. The hierarchy is based on two formal properties: space complexity, which measures the RNN's memory, and rational…

Previous242-251Next