Techniques · advanced

What is RLHF?

A plain-English explanation of RLHF (Reinforcement Learning from Human Feedback) — what it means, why it matters, and how it is used in AI.

RLHF
Reinforcement Learning from Human Feedback
Reinforcement Learning from Human Feedback (RLHF) is a training technique where human evaluators rank or rate model outputs, and those preferences are used as a reward signal to fine-tune the model.
"Human raters compare two AI responses and indicate which is more helpful. This preference data trains a reward model, which guides further training of the language model."

Also known as: RLHF, preference learning, human preference optimisation

Why does RLHF matter?

RLHF is used to align language models with human values and make them more helpful, harmless, and honest.

Practice this term

The best way to remember RLHF is to practice unscrambling it. AI Terminology Scrambler uses spaced repetition to help you learn and retain AI vocabulary in just a few minutes a day.

Practice RLHF now →

Related AI terms