Reinforcement Learning from Human Feedback Basics

Chapter Contents

Key Related Works

In this chapter we detail the key papers and projects that got the RLHF field to where it is today. This is not intended to be a comprehensive review on RLHF and the related fields, but rather a starting point and retelling of how we got to today.

Early RL on Preferences

Christriano et al etc

RLHP on Language Models

Learning to summarize, first work on language models (zieglar et al)

Pre Modern Models

InstructGPT, WebgGPT, Sparrow, Etc