Reinforcement Learning
All HubsReinforcement Learning Center
Master reinforcement learning for LLMs, from RLHF fundamentals to advanced training techniques
Training Cost Tools
RL Fundamentals
Reinforcement Learning for LLMs: Intuitive Guide
Comprehensive introduction to reinforcement learning concepts and applications in LLMs
Direct Reinforcement Learning Base LLMs Next
Explore direct RL approaches for training large language models from scratch
Training Pipelines & Methods
Replicate DeepSeek R1 with RL: A Guide
Build a complete RL pipeline from scratch using GRPO for advanced LLM reasoning
GRPO Training Pipeline: SFT to RL for Better Reasoning
Complete guide covering SFT with cold-start data, CoT prompting, and GRPOTrainer
Training a 671b LLM Reinforcement Learning
Insights into training large-scale models with reinforcement learning techniques
Reward Models & Optimization
DeepSeek-Coder-V2's Reward Model Explained
Deep dive into modular reward functions for accuracy, reasoning, and format
GRPO-RoC: Better Training for Tool-Augmented AI
Advanced training method improving AI reasoning through high-quality data curation
Advanced Techniques
Fine-Tuning & SFT
Supervised Fine-Tuning: A Guide to LLM Reasoning
Complete SFT pipeline for enhancing LLM reasoning from SFT to knowledge distillation
SFT Flaw: A Learning Rate Tweak
Critical insights into learning rate optimization for supervised fine-tuning
Supervised Fine-Tuning (SFT) for LLMs: Practical Guide
Transform base models into chat assistants with datasets and best practices
Curated Resources
OpenAI Spinning Up in Deep RL
Educational resource for learning deep reinforcement learning fundamentals
Hugging Face TRL Documentation
Transformer Reinforcement Learning library for training LLMs with RL
DeepMind's RL Course
Comprehensive course on reinforcement learning from DeepMind