Research - Papers
Explore a selection of our published work on a variety of key research challenges in AI.
Task-aware Retrieval with Instructions
We study the problem of retrieval with instructions, where users of a retrieval system explicitly describe their intent along with their queries. We aim to develop a general-purpose task-aware…
When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories
Despite their impressive performance on diverse tasks, large language models (LMs) still struggle with tasks requiring rich world knowledge, implying the difficulty of encoding a wealth of world…
Words as Gatekeepers: Measuring Discipline-specific Terms and Meanings in Scholarly Publications
Scholarly text is often laden with jargon, or specialized language that can facilitate efficient in-group communication within fields but hinder understanding for out-groups. In this work, we…
Chain-of-Thought Hub: A Continuous Effort to Measure Large Language Models' Reasoning Performance
As large language models (LLMs) are continuously being developed, their evaluation becomes increasingly important yet challenging. This work proposes Chain-of-Thought Hub, an open-source evaluation…
ARIES: A Corpus of Scientific Paper Edits Made in Response to Peer Reviews
Revising scientific papers based on peer feedback is a challenging task that requires not only deep scientific knowledge and reasoning, but also the ability to recognize the implicit requests in…
Evaluating the Social Impact of Generative AI Systems in Systems and Society
Generative AI systems across modalities, ranging from text, image, audio, and video, have broad social impacts, but there exists no official standard for means of evaluating those impacts and which…
Morphosyntactic probing of multilingual BERT models
We introduce an extensive dataset for multilingual probing of morphological information in language models (247 tasks across 42 languages from 10 families), each consisting of a sentence with a…
Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved With Text
In-context vision and language models like Flamingo support arbitrarily interleaved sequences of images and text as input. This format not only enables few-shot learning via interleaving independent…
Z-ICL: Zero-Shot In-Context Learning with Pseudo-Demonstrations
Although large language models can be prompted for both zero- and few-shot learning, performance drops significantly when no demonstrations are available. In this paper, we introduce Z-ICL, a new…
Just CHOP: Embarrassingly Simple LLM Compression
Large language models (LLMs) enable unparalleled few- and zero-shot reasoning capabilities but at a high computational footprint. A growing assortment of methods for compression promises to reduce…