A Little Bit of Reinforcement Learning from Human Feedback

A short introduction to RLHF and post-training focused on language models.

Nathan Lambert

Chapter Contents

Definitions

This chapter includes all the definitions, symbols, and operatings frequently used in the RLHF process.

ML Definitions

\[ D_{KL}(P || Q) = \sum_{x \in \mathcal{X}} P(x) \log \left(\frac{P(x)}{Q(x)}\right) \]

NLP Definitions

RL Definitions

← Previous: Key Related Works Next: Problem Formulation →