Reinforcement Learning

All Hubs

Knowledge Hub

Reinforcement Learning Center

Master reinforcement learning for LLMs, from RLHF fundamentals to advanced training techniques

Core Articles

Training Guides

Reward Models

Resources

Training Cost Tools

LLM Token Calculator - Estimate RL Training Costs

Calculate token costs for RLHF, DPO, and other reinforcement learning training methods across different models

Reinforcement Learning for LLMs: Intuitive Guide

2025-07-03

Comprehensive introduction to reinforcement learning concepts and applications in LLMs

Direct Reinforcement Learning Base LLMs Next

2025-07-06

Explore direct RL approaches for training large language models from scratch

Replicate DeepSeek R1 with RL: A Guide

2025-09-03

Build a complete RL pipeline from scratch using GRPO for advanced LLM reasoning

GRPO Training Pipeline: SFT to RL for Better Reasoning

2025-09-05

Complete guide covering SFT with cold-start data, CoT prompting, and GRPOTrainer

Training a 671b LLM Reinforcement Learning

2025-07-09

Insights into training large-scale models with reinforcement learning techniques

DeepSeek-Coder-V2's Reward Model Explained

2025-09-04

Deep dive into modular reward functions for accuracy, reasoning, and format

GRPO-RoC: Better Training for Tool-Augmented AI

2025-09-16

Advanced training method improving AI reasoning through high-quality data curation

Two Major Challenges in Reinforcement Learning Solved

2025-07-24

ICLR 2025 papers solving inaction and delay regret in real-time RL applications

Separated Architectures for LLM RL Post Training

2025-07-30

Advanced architectures and TransferQueue patterns for efficient post-training

Supervised Fine-Tuning: A Guide to LLM Reasoning

2025-09-07

Complete SFT pipeline for enhancing LLM reasoning from SFT to knowledge distillation

SFT Flaw: A Learning Rate Tweak

2025-07-27

Critical insights into learning rate optimization for supervised fine-tuning

Supervised Fine-Tuning (SFT) for LLMs: Practical Guide

2025-07-16

Transform base models into chat assistants with datasets and best practices

OpenAI Spinning Up in Deep RL

Educational resource for learning deep reinforcement learning fundamentals

Hugging Face TRL Documentation

Transformer Reinforcement Learning library for training LLMs with RL

DeepMind's RL Course

Comprehensive course on reinforcement learning from DeepMind

Related Technology Centers