Skip to main content ->
Ai2

Research - Papers

Explore a selection of our published work on a variety of key research challenges in AI.

Filter papers

Is GPT-3 Text Indistinguishable from Human Text? Scarecrow: A Framework for Scrutinizing Machine Text

Yao DouMaxwell ForbesRik Koncel-KedziorskiYejin Choi
2021
arXiv

Modern neural language models can produce remarkably fluent and grammatical text. So much, in fact, that recent work by Clark et al. (2021) has reported that conventional crowdsourcing can no longer… 

Divergence Frontiers for Generative Models: Sample Complexity, Quantization Level, and Frontier Integral

Lang LiuKrishna PillutlaS. WelleckZ. Harchaoui
2021
arXiv

The spectacular success of deep generative models calls for quantitative tools to measure their statistical performance. Divergence frontiers have recently been proposed as an evaluation framework… 

TIMEDIAL: Temporal Commonsense Reasoning in Dialog

Lianhui QinAditya GuptaShyam UpadhyayManaal Faruqui
2021
ACL

Everyday conversations require understanding everyday events, which in turn, requires understanding temporal commonsense concepts interwoven with those events. Despite recent progress with massive… 

"I'm Not Mad": Commonsense Implications of Negation and Contradiction

Liwei JiangAntoine BosselutChandra BhagavatulaYejin Choi
2021
NAACL

Natural language inference requires reasoning about contradictions, negations, and their commonsense implications. Given a simple premise (e.g., “I’m mad at you”), humans can reason about the… 

DExperts: Decoding-Time Controlled Text Generation with Experts and Anti-Experts

Alisa LiuMaarten SapXiming LuYejin Choi
2021
ACL

Despite recent advances in natural language generation, it remains challenging to control attributes of generated text. We propose DExperts: Decoding-time Experts, a decoding-time method for… 

Challenges in Algorithmic Debiasing for Toxic Language Detection

Xuhui ZhouMaarten SapSwabha SwayamdiptaYejin Choi
2021
EACL

Biased associations have been a challenge in the development of classifiers for detecting toxic language, hindering both fairness and accuracy. As potential solutions, we investigate recently… 

Challenges in Automated Debiasing for Toxic Language Detection

Xuhui ZhouMaarten SapSwabha SwayamdiptaYejin Choi
2021
EACL

Biased associations have been a challenge in the development of classifiers for detecting toxic language, hindering both fairness and accuracy. As potential solutions, we investigate recently… 

Discourse Understanding and Factual Consistency in Abstractive Summarization

Saadia GabrielAntoine BosselutJeff DaYejin Choi
2021
EACL

We introduce Cooperative Generator-Discriminator Networks (Co-opNet), a general framework for abstractive summarization with distinct modeling of the narrative flow in the output summary. Most… 

CLIPScore: A Reference-free Evaluation Metric for Image Captioning

Jack HesselAri HoltzmanMaxwell ForbesYejin Choi
2021
EMNLP

Image captioning has conventionally relied on reference-based automatic evaluations, where machine captions are compared against captions written by humans. This is in contrast to the reference-free… 

Misinfo Reaction Frames: Reasoning about Readers' Reactions to News Headlines (preprint)

Saadia GabrielSkyler HallinanMaarten SapYejin Choi
2021
ACL

Even to a simple and short news headline, readers react in a multitude of ways: cognitively (e.g., inferring the writer's intent), emotionally (e.g., feeling distrust), and behaviorally (e.g.,…