The Basics of Reinforcement Learning from Human Feedback
Home
GitHub Repository
PDF
Order a copy (Soon)
Introductions
Introduction
What are preferences?
Optimization and RL
Seminal (Recent) Works
Problem Setup
Definitions
Preference Data
Reward Modeling
Regularization
Optimization
Instruction Tuning
Rejection Sampling
Policy Gradients
Direct Alignment Algorithms
Advanced (TBD)
Constitutional AI
Synthetic Data
Evaluation
Open Questions (TBD)
Over-optimization
Style
Chapter Contents
[Incomplete] Evaluation
[Incomplete] Evaluation