Research - Papers
Explore a selection of our published work on a variety of key research challenges in AI.
Continued Pretraining for Better Zero- and Few-Shot Promptability
Recently introduced language model prompting methods can achieve high accuracy in zero-and few-shot settings while requiring few to no learned task-specific parameters. Never-theless, these methods…
Exploring The Landscape of Distributional Robustness for Question Answering Models
We conduct a large empirical evaluation to investigate the landscape of distributional robustness in question answering. Our investigation spans over 350 models and 16 question answering datasets,…
Lila: A Unified Benchmark for Mathematical Reasoning
Mathematical reasoning skills are essential for general-purpose intelligent systems to perform tasks from grocery shopping to climate modeling. Towards evaluating and improving AI systems in this…
Statistical and Computational Guarantees for Influence Diagnostics
Influence diagnostics such as influence functions and approximate maximum influence perturbations are popular in machine learning and in AI domain applications. Influence diagnostics are powerful…
What Makes Instruction Learning Hard? An Investigation and a New Challenge in a Synthetic Environment
The instruction learning paradigm—where a model learns to perform new tasks from task descriptions alone—has become popular in general-purpose model research. The capabilities of large transformer…
Teaching Broad Reasoning Skills via Decomposition-Guided Contexts
Question-answering datasets require a broad set of reasoning skills. We show how to use question decompositions to teach language models these broad reasoning skills in a robust fashion.…
On Advances in Text Generation from Images Beyond Captioning: A Case Study in Self-Rationalization
Integrating vision and language has gained no-table attention following the success of pretrained language models. Despite that, a fraction of emerging multimodal models is suitable for text…
WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation
A recurring challenge of crowdsourcing NLP datasets at scale is that human writers often rely on repetitive patterns when crafting examples, leading to a lack of linguistic diversity. We introduce a…
Modeling Context With Linear Attention for Scalable Document-Level Translation
Document-level machine translation leverages inter-sentence dependencies to produce more coherent and consistent translations. However, these models, predominantly based on transformers, are…
Lexical Generalization Improves with Larger Models and Longer Training
While fine-tuned language models perform well on many tasks, they were also shown to rely on superficial surface features such as lexical overlap. Excessive utilization of such heuristics can lead to…