OLMES (Open Language Model Evaluation Standard) is a set of principles and associated tasks, for evaluating large language models (LLMs). The current version includes:
For more details, see instructions here.
The curated few-shot examples can be found in this file: std_fewshot.py.
@misc{gu2024olmes,
title={OLMES: A Standard for Language Model Evaluations},
author={Yuling Gu and Oyvind Tafjord and Bailey Kuehl and
Dany Haddad and Jesse Dodge and Hannaneh Hajishirzi},
year={2024},
eprint={2406.08446},
archivePrefix={arXiv}
}
Yuling Gu, Oyvind Tafjord, Bailey Kuehl, Dany Haddad, Jesse Dodge, and Hannaneh Hajishirzi