A short introduction to RLHF and post-training focused on language models.
Explore completions from instruction-tuned models and their downstream RLHF counterparts.
The models used here are sourced from various open-source post-training pipelines from the Allen Institute for AI -- OLMo and Tülu models. Overall, it is rare for intermediate models to be released at all. For each of the 18 models (9 pairs of SFT and RLHF models), we have generated 3 completions to a static set of 16 prompts.
Feel free to use these examples for talks and other educational purposes. Please cite the book and the authors of the models. The data is available here and is licensed under ODC-BY.