An abstract illustration of swirling shapes, meant to denote a futuristic feeling.

Research - Papers

Explore a selection of our published work on a variety of key research challenges in AI.

TaskWeb: Selecting Better Source Tasks for Multi-task NLP

Joongwon KimAkari AsaiGabriel IlharcoHannaneh Hajishirzi

2023

EMNLP

Recent work in NLP has shown promising results in training models on large amounts of tasks to achieve better generalization. However, it is not well-understood how tasks are related, and how…

Vera: A General-Purpose Plausibility Estimation Model for Commonsense Statements

Jiacheng LiuWenya WangDianzhuo WangHanna Hajishirzi

2023

EMNLP

Despite the much discussed capabilities of today's language models, they are still prone to silly and unexpected commonsense failures. We consider a retrospective verification approach that reflects…

We're Afraid Language Models Aren't Modeling Ambiguity

Alisa LiuZhaofeng WuJulian MichaelYejin Choi

2023

EMNLP

Ambiguity is an intrinsic feature of natural language. Managing ambiguity is a key part of human language understanding, allowing us to anticipate misunderstanding as communicators and revise our…

S2abEL: A Dataset for Entity Linking from Scientific Tables

Yuze LouBailey KuehlErin BransomDoug Downey

2023

EMNLP

Entity linking (EL) is the task of linking a textual mention to its corresponding entry in a knowledge base, and is critical for many knowledge-intensive NLP applications. When applied to tables in…

Answering Questions by Meta-Reasoning over Multiple Chains of Thought

Ori YoranTomer WolfsonBen BoginJonathan Berant

2023

EMNLP

Modern systems for multi-hop question answering (QA) typically break questions into a sequence of reasoning steps, termed chain-of-thought (CoT), before arriving at a final answer. Often, multiple…

Continued Pretraining for Better Zero- and Few-Shot Promptability

Zhaofeng WuRobert L. Logan IVPete WalshIz Beltagy

2022

EMNLP

Recently introduced language model prompting methods can achieve high accuracy in zero-and few-shot settings while requiring few to no learned task-speciﬁc parameters. Never-theless, these methods…

Exploring The Landscape of Distributional Robustness for Question Answering Models

Anas AwadallaMitchell WortsmanGabriel IlharcoLudwig Schmidt

2022

Findings of EMNLP

We conduct a large empirical evaluation to investigate the landscape of distributional robustness in question answering. Our investigation spans over 350 models and 16 question answering datasets,…

Hyperdecoders: Instance-specific decoders for multi-task NLP

Hamish IvisonMatthew E. Peters

2022

Findings of EMNLP

We investigate input-conditioned hypernetworks for multi-tasking in NLP, generating parameter-efﬁcient adaptations for a decoder using a hypernetwork conditioned on the output of an encoder. This…

Lila: A Unified Benchmark for Mathematical Reasoning

Swaroop MishraMatthew FinlaysonPan LuAshwin Kalyan

2022

EMNLP

Mathematical reasoning skills are essential for general-purpose intelligent systems to perform tasks from grocery shopping to climate modeling. Towards evaluating and improving AI systems in this…

Abstract Visual Reasoning with Tangram Shapes

Anya JiNoriyuki KojimaN. RushYoav Artzi

2022

EMNLP

We introduce KiloGram, a resource for studying abstract visual reasoning in humans and machines. Drawing on the history of tangram puzzles as stimuli in cognitive science, we build a richly…

Previous31-40Next