Infrastructure
All HubsInfrastructure Center
Master AI system architecture from GPU optimization to distributed training and scalable deployment
Cost Optimization Tools
Infrastructure Fundamentals
Bridging Traditional and AI Infrastructure: Key Differences
Learn how traditional infrastructure skills translate to AI systems and understand key concepts
AI Infrastructure: The Real Engine Behind AI Agents
Understanding data pipelines, infrastructure requirements, and system design for AI applications
GPU & Performance Optimization
PyTorch Memory Snapshot: A Guide to GPU Usage Analysis
Master GPU memory profiling and optimization techniques for PyTorch applications
First Principles of GPU Performance
Deep dive into GPU architecture and performance optimization fundamentals
Optimizing TiledCopy for Memory Coalescing NVIDIA
Advanced CUDA optimization techniques for memory coalescing and data transfers
LLM Serving & Deployment
Boost LLM Goodput: Prefill-Decode Separation
Learn how DistServe architecture optimizes LLM serving with 4.48x goodput improvement
LLM Inference on H800 Disaggregated Architecture
Explore disaggregated architecture patterns for large-scale LLM inference
Deploying Kimi K2 Scalable MoE Model
Practical guide to deploying Mixture-of-Experts models on GPU clusters
SGLang Disaggregated LLM Inference Architecture
Understanding SGLang's approach to distributed LLM inference
Training Infrastructure
30x Faster LLM RL Training: The Checkpoint-Engine Story
Optimization techniques that reduced parameter update time from 10 minutes to 20 seconds
Separated Architectures for LLM RL Post Training
Advanced architectures for efficient reinforcement learning post-training