Skip to main content ->
Ai2

Research - Papers

Explore a selection of our published work on a variety of key research challenges in AI.

Filter papers

RealTime QA: What's the Answer Right Now?

Jungo KasaiKeisuke SakaguchiYoichi TakahashiKentaro Inui
2023
NeurIPS

We introduce R EAL T IME QA, a dynamic question answering (QA) platform that announces questions and evaluates systems on a regular basis (weekly in this version). R E AL T IME QA inquires about the… 

SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks

Bill Yuchen LinYicheng FuKarina YangXiang Ren
2023
NeurIPS

We introduce SwiftSage, a novel agent framework inspired by the dual-process theory of human cognition, designed to excel in action planning for complex interactive reasoning tasks. SwiftSage… 

Editing Common Sense in Transformers

Anshita Gupta*Debanjan Mondal*Akshay Krishna Sheshadri*Niket Tandon*
2023
EMNLP

Editing model parameters directly in Transformers makes updating open-source transformer-based models possible without re-training. However, these editing methods have only been evaluated on… 

FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions

Hyunwoo KimMelanie SclarXuhui ZhouMaarten Sap
2023
EMNLP

Theory of mind (ToM) evaluations currently focus on testing models using passive narratives that inherently lack interactivity. We introduce FANToM, a new benchmark designed to stress-test ToM… 

Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs without Fine-tuning

Ximing LuFaeze BrahmanPeter WestYejin Choi
2023
EMNLP

Large language models excel at a variety of language tasks when prompted with examples or instructions. Yet controlling these models through prompting alone is limited. Tailoring language models… 

SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization

Hyunwoo KimJack HesselLiwei JiangYejin Choi
2023
EMNLP

We present SODA : the first publicly available, million-scale high-quality social dialogue dataset. Using SODA , we train COSMO : a generalizable conversation agent outperforming previous… 

Vera: A General-Purpose Plausibility Estimation Model for Commonsense Statements

Jiacheng LiuWenya WangDianzhuo WangHanna Hajishirzi
2023
EMNLP

Despite the much discussed capabilities of today's language models, they are still prone to silly and unexpected commonsense failures. We consider a retrospective verification approach that reflects… 

We're Afraid Language Models Aren't Modeling Ambiguity

Alisa LiuZhaofeng WuJulian MichaelYejin Choi
2023
EMNLP

Ambiguity is an intrinsic feature of natural language. Managing ambiguity is a key part of human language understanding, allowing us to anticipate misunderstanding as communicators and revise our… 

What Makes it Ok to Set a Fire? Iterative Self-distillation of Contexts and Rationales for Disambiguating Defeasible Social and Moral Situations

Kavel RaoLiwei JiangValentina PyatkinYejin Choi
2023
Conference on Empirical Methods in Natural Language Processing • Findings

Moral or ethical judgments rely heavily on the specific contexts in which they occur. Understanding varying shades of defeasible contextualizations (i.e., additional information that strengthens or… 

"You Are An Expert Linguistic Annotator": Limits of LLMs as Analyzers of Abstract Meaning Representation

Allyson EttingerJena D. HwangValentina PyatkinYejin Choi
2023
Conference on Empirical Methods in Natural Language Processing

Large language models (LLMs) show amazing proficiency and fluency in the use of language. Does this mean that they have also acquired insightful linguistic knowledge about the language, to an extent…