An abstract illustration of swirling shapes, meant to denote a futuristic feeling.

Research - Papers

Explore a selection of our published work on a variety of key research challenges in AI.

Faith and Fate: Limits of Transformers on Compositionality

Nouha DziriXiming LuMelanie SclarYejin Choi

2023

NeurIPS

Transformer large language models (LLMs) have sparked admiration for their exceptional performance on tasks that demand intricate multi-step reasoning. Yet, these models simultaneously show failures…

SugarCrepe: Fixing Hackable Benchmarks for Vision-Language Compositionality

Cheng-Yu HsiehJieyu ZhangZixian MaRanjay Krishna

2023

NeurIPS

In the last year alone, a surge of new benchmarks to measure compositional understanding of vision-language models have permeated the machine learning ecosystem. Given an image, these benchmarks…

SciRepEval: A Multi-Format Benchmark for Scientific Document Representations

Amanpreet SinghMike D'ArcyArman CohanSergey Feldman

2023

EMNLP

Learned representations of scientific documents can serve as valuable input features for downstream tasks without further fine-tuning. However, existing benchmarks for evaluating these…

SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization

Hyunwoo KimJack HesselLiwei JiangYejin Choi

2023

EMNLP

We present SODA : the ﬁrst publicly available, million-scale high-quality social dialogue dataset. Using SODA , we train COSMO : a generalizable conversation agent outperforming previous…

We're Afraid Language Models Aren't Modeling Ambiguity

Alisa LiuZhaofeng WuJulian MichaelYejin Choi

2023

EMNLP

Ambiguity is an intrinsic feature of natural language. Managing ambiguity is a key part of human language understanding, allowing us to anticipate misunderstanding as communicators and revise our…

Vera: A General-Purpose Plausibility Estimation Model for Commonsense Statements

Jiacheng LiuWenya WangDianzhuo WangHanna Hajishirzi

2023

EMNLP

Despite the much discussed capabilities of today's language models, they are still prone to silly and unexpected commonsense failures. We consider a retrospective verification approach that reflects…

Language Models with Rationality

Nora KassnerOyvind TafjordAshish SabharwalPeter Clark

2023

EMNLP

While large language models (LLMs) are proficient at question-answering (QA), the dependencies between their answers and other "beliefs" they may have about the world are typically unstated, and may…

Increasing Probability Mass on Answer Choices Does Not Always Improve Accuracy

Sarah WiegreffeMatthew FinlaysonOyvind TafjordAshish Sabharwal

2023

EMNLP

When pretrained language models (LMs) are applied to discriminative tasks such as multiple-choice questions, they place probability mass on vocabulary tokens that aren't among the given answer…

Editing Common Sense in Transformers

Anshita Gupta*Debanjan Mondal*Akshay Krishna Sheshadri*Niket Tandon*

2023

EMNLP

Editing model parameters directly in Transformers makes updating open-source transformer-based models possible without re-training. However, these editing methods have only been evaluated on…

Do All Languages Cost the Same? Tokenization in the Era of Commercial Language Models

Orevaoghene AhiaSachin KumarHila GonenYulia Tsvetkov

2023

EMNLP

Language models have graduated from being research prototypes to commercialized products offered as web APIs, and recent works have highlighted the multilingual capabilities of these products. The…

Previous141-150Next