Skip to main content ->
Ai2

Research - Papers

Explore a selection of our published work on a variety of key research challenges in AI.

Filter papers

FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions

Hyunwoo KimMelanie SclarXuhui ZhouMaarten Sap
2023
EMNLP

Theory of mind (ToM) evaluations currently focus on testing models using passive narratives that inherently lack interactivity. We introduce FANToM, a new benchmark designed to stress-test ToM… 

Increasing Probability Mass on Answer Choices Does Not Always Improve Accuracy

Sarah WiegreffeMatthew FinlaysonOyvind TafjordAshish Sabharwal
2023
EMNLP

When pretrained language models (LMs) are applied to discriminative tasks such as multiple-choice questions, they place probability mass on vocabulary tokens that aren't among the given answer… 

Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs without Fine-tuning

Ximing LuFaeze BrahmanPeter WestYejin Choi
2023
EMNLP

Large language models excel at a variety of language tasks when prompted with examples or instructions. Yet controlling these models through prompting alone is limited. Tailoring language models… 

Language Models with Rationality

Nora KassnerOyvind TafjordAshish SabharwalPeter Clark
2023
EMNLP

While large language models (LLMs) are proficient at question-answering (QA), the dependencies between their answers and other "beliefs" they may have about the world are typically unstated, and may… 

Machine Reading Comprehension using Case-based Reasoning

Dung Ngoc ThaiDhruv AgarwalMudit ChaudharyA. McCallum
2023
EMNLP

We present an accurate and interpretable method for answer extraction in machine reading comprehension that is reminiscent of case-based reasoning (CBR) from classical AI. Our method (CBR-MRC)… 

Measuring and Narrowing the Compositionality Gap in Language Models

Ofir PressMuru ZhangSewon MinMike Lewis
2023
EMNLP Findings

We investigate the ability of language models to perform compositional reasoning tasks where the overall solution depends on correctly composing the answers to sub-problems. We measure how often… 

PaperMage: A Unified Toolkit for Processing, Representing, and Manipulating Visually-Rich Scientific Documents

Kyle LoZejiang ShenBenjamin NewmanLuca Soldaini
2023
EMNLP

Despite growing interest in applying natural language processing (NLP) and computer vision (CV) models to the scholarly domain, scientific documents remain challenging to work with. They’re often in… 

SHARCS: Efficient Transformers through Routing with Dynamic Width Sub-networks

Mohammadreza SalehiSachin MehtaAditya KusupatiHannaneh Hajishirzi
2023
EMNLP

We introduce SHARCS for adaptive inference that takes into account the hardness of input samples. SHARCS can train a router on any transformer network, enabling the model to direct different samples… 

SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization

Hyunwoo KimJack HesselLiwei JiangYejin Choi
2023
EMNLP

We present SODA : the first publicly available, million-scale high-quality social dialogue dataset. Using SODA , we train COSMO : a generalizable conversation agent outperforming previous… 

TaskWeb: Selecting Better Source Tasks for Multi-task NLP

Joongwon KimAkari AsaiGabriel IlharcoHannaneh Hajishirzi
2023
EMNLP

Recent work in NLP has shown promising results in training models on large amounts of tasks to achieve better generalization. However, it is not well-understood how tasks are related, and how…