Research - Papers
Explore a selection of our published work on a variety of key research challenges in AI.
Continued Pretraining for Better Zero- and Few-Shot Promptability
Recently introduced language model prompting methods can achieve high accuracy in zero-and few-shot settings while requiring few to no learned task-specific parameters. Never-theless, these methods…
Exploring The Landscape of Distributional Robustness for Question Answering Models
We conduct a large empirical evaluation to investigate the landscape of distributional robustness in question answering. Our investigation spans over 350 models and 16 question answering datasets,…
Hyperdecoders: Instance-specific decoders for multi-task NLP
We investigate input-conditioned hypernetworks for multi-tasking in NLP, generating parameter-efficient adaptations for a decoder using a hypernetwork conditioned on the output of an encoder. This…
Lila: A Unified Benchmark for Mathematical Reasoning
Mathematical reasoning skills are essential for general-purpose intelligent systems to perform tasks from grocery shopping to climate modeling. Towards evaluating and improving AI systems in this…
Abstract Visual Reasoning with Tangram Shapes
We introduce KiloGram, a resource for studying abstract visual reasoning in humans and machines. Drawing on the history of tangram puzzles as stimuli in cognitive science, we build a richly…
CONDAQA: A Contrastive Reading Comprehension Dataset for Reasoning about Negation
The full power of human language-based communication cannot be realized without negation. All human languages have some form of negation. Despite this, negation remains a challenging phenomenon for…
Ensemble Transformer for Efficient and Accurate Ranking Tasks: an Application to Question Answering Systems
Large transformer models can highly improve Answer Sentence Selection (AS2) tasks, but their high computational costs prevent their use in many real-world applications. In this pa-per, we explore…
Entailer: Answering Questions with Faithful and Truthful Chains of Reasoning
Our goal is a question-answering (QA) system that can show how its answers are implied by its own internal beliefs via a systematic chain of reasoning . Such a capability would allow better…
GENIE: Toward Reproducible and Standardized Human Evaluation for Text Generation
While often assumed a gold standard, effective human evaluation of text generation remains an important, open area for research. We revisit this problem with a focus on pro-ducing consistent…
How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers
The attention mechanism is considered the backbone of the widely-used Transformer architecture. It contextualizes the input by computing input-specific attention matrices. We find that this…