Research - Papers
Explore a selection of our published work on a variety of key research challenges in AI.
Multi-Vector Models with Textual Guidance for Fine-Grained Scientific Document Similarity
We present a new scientific document similarity model based on matching fine-grained aspects of texts. To train our model, we exploit a naturally-occurring source of supervision: sentences in the…
A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge
The Visual Question Answering (VQA) task aspires to provide a meaningful testbed for the development of AI models that can jointly reason over visual and natural language inputs. Despite a…
VILA: Improving Structured Content Extraction from Scientific PDFs Using Visual Layout Groups
Accurately extracting structured content from PDFs is a critical first step for NLP over scientific papers. Recent work has improved extraction accuracy by incorporating elementary layout…
What Language Model to Train if You Have One Million GPU Hours?
The crystallization of modeling methods around the Transformer architecture has been a boon for practitioners. Simple, well-motivated architectural variations that transfer across tasks and scale,…
Retrieval Data Augmentation Informed by Downstream Question Answering Performance
Training retrieval models to fetch contexts for Question Answering (QA) over large corpora requires labeling relevant passages in those corpora. Since obtaining exhaustive manual annotations of all…
Quark: Controllable Text Generation with Reinforced Unlearning
Large-scale language models often learn behaviors that are misaligned with user expectations. Generated text may contain offensive or toxic language, contain significant repetition, or be of a…
Multimodal Knowledge Alignment with Reinforcement Learning
Large language models readily adapt to novel settings, even without task-specific training data. Can their zero-shot capacity be extended to multimodal inputs? In this work, we propose ESPER which…
ProsocialDialog: A Prosocial Backbone for Conversational Agents
Most existing dialogue systems fail to respond properly to potentially unsafe user utterances by either ignoring or passively agreeing with them. To address this issue, we introduce ProsocialDialog,…
NaturalProver: Grounded Mathematical Proof Generation with Language Models
Theorem proving in natural mathematical language - the mixture of symbolic and natural language used by humans - plays a central role in mathematical advances and education, and tests aspects of…
Investigating the Benefits of Free-Form Rationales
Free-form rationales aim to aid model interpretability by supplying the background knowledge that can help understand model decisions. Crowdsourced rationales are provided for commonsense QA…