A Little Bit of Reinforcement Learning from Human Feedback
A short introduction to RLHF and post-training focused on language models.
Nathan Lambert
Chapter Contents
Instruction Tuning
Instruction Tuning
← Previous: Regularization
Next: Rejection Sampling →