A plain-English explanation of RLHF (Reinforcement Learning from Human Feedback) — what it means, why it matters, and how it is used in AI.
Also known as: RLHF, preference learning, human preference optimisation
RLHF is used to align language models with human values and make them more helpful, harmless, and honest.
The best way to remember RLHF is to practice unscrambling it. AI Terminology Scrambler uses spaced repetition to help you learn and retain AI vocabulary in just a few minutes a day.
Practice RLHF now →