17
Lesson 17 of 20 ยท Smart Helpers
Fine-tuning and RLHF
After pre-training, models are fine-tuned with human feedback (RLHF). Humans rank responses, teaching the model to prefer helpful, safe outputs.
- RLHF uses human rankings to improve models.
- Fine-tuning aligns models with human preferences.
Think about it
What is fine-tuning an LLM?
