MoE Post-Training Guide: Load Balancing, Routing Replay, and Expert Parallelism
A practical guide to MoE post-training, covering the tradeoff between load balancing and task quality, why RL becomes unstable when routing changes across engines or policy versions, and how to choose EP versus ETP in large-scale deployments.
Qing Ke Ai