Research - Papers
Explore a selection of our published work on a variety of key research challenges in AI.
Measuring and Improving Attentiveness to Partial Inputs with Counterfactuals
The inevitable appearance of spurious correlations in training datasets hurts the generalization of NLP models on unseen data. Previous work has found that datasets with paired inputs are prone to…
Merge to Learn: Efficiently Adding Skills to Language Models with Model Merging
Adapting general-purpose language models to new skills is currently an expensive process that must be repeated as new instruction datasets targeting new skills are created, or can cause the models…
Scalable Data Ablation Approximations for Language Models through Modular Training and Merging
Training data compositions for Large Language Models (LLMs) can significantly affect their downstream performance. However, a thorough data ablation study exploring large sets of candidate data…
ComPO: Community Preferences for Language Model Personalization
Conventional algorithms for training language models (LMs) with human feedback rely on preferences that are assumed to account for an"average"user, disregarding subjectivity and finer-grained…
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Today's most advanced multimodal models remain proprietary. The strongest open-weight models rely heavily on synthetic data from proprietary VLMs to achieve good performance, effectively distilling…
OLMoE: Open Mixture-of-Experts Language Models
We introduce OLMoE, a fully open, state-of-the-art language model leveraging sparse Mixture-of-Experts (MoE). OLMoE-1B-7B has 7 billion (B) parameters but uses only 1B per input token. We pretrain…
Data Contamination Report from the 2024 CONDA Shared Task
The 1st Workshop on Data Contamination (CONDA 2024) focuses on all relevant aspects of data contamination in natural language processing, where data contamination is understood as situations where…
Evaluating In-Context Learning of Libraries for Code Generation
Contemporary Large Language Models (LLMs) exhibit a high degree of code generation and comprehension capability. A particularly promising area is their ability to interpret code modules from…
The Bias Amplification Paradox in Text-to-Image Generation
Bias amplification is a phenomenon in which models increase imbalances present in the training data. In this paper, we study bias amplification in the text-to-image domain using Stable Diffusion by…
OLMES: A Standard for Language Model Evaluations
Progress in AI is often demonstrated by new models claiming improved performance on tasks measuring model capabilities. Evaluating language models in particular is challenging, as small changes to…