Reinforcement Learning from Human Feedback Basics

Chapter Contents

Regularization

Throughout the RLHF optimization, many regularization steps are used to prevent over-optimization

KL Distances

Reference Policy

Reference Dataset

Likelihood Penalty

Reward Bonuses

Margin Losses