Latest Articles

Dive deep into the world of Artificial Intelligence with our curated collection of articles, covering the latest breakthroughs and insights from leading researchers and engineers.

Technology

AI Infrastructure: The Real Engine Behind AI Agents

Struggling with AI projects? The problem isn't your models, it's your AI infrastructure. Learn why data silos & lag hold you back and how to build a better f...

Pingxingjilu

AI infrastructureAI agentsdata-to-AI pipeline
Technology

LLaMA Factory: Easy LLM Fine-Tuning Guide

Learn to install and use LLaMA Factory to fine-tune hundreds of LLMs on your local machine. This guide covers CUDA setup, installation, and WebUI usage.

Number in the Moutain

LLaMA FactoryLLM fine-tuningfine-tune LLM
Foundational Concepts

What Is a Transformer Model? An In-Depth Guide

A deep dive into the Transformer architecture, the engine behind modern LLMs. Understand self-attention, encoders, decoders, and how they work together.

Alex Carter

Transformer modelTransformer architectureTransformers AI
Tools & Frameworks

Ollama Guide: Run LLMs Locally on Your Machine

A comprehensive guide to using Ollama for running large language models like Llama 3 and Mistral on your local machine. Learn installation, commands, and how to create custom models.

Jordan Lee

Ollamarun LLMs locallyOllama guide
Getting Started

What Are LLMs? A Guide to Generative AI

Discover what Large Language Models (LLMs) are and how they power Generative AI. Learn about pre-training, fine-tuning, and the Transformer architecture.

hhmy27

Large Language Models (LLMs)Generative AITransformer architecture
Technology

Separated Architectures for LLM RL Post-Training

Explore the shift to separated architectures for RL post-training of LLMs. Learn how systems like AsyncFlow & TransferQueue solve data orchestration challenges.

Little Boji

RL post-trainingseparated architectureLLM post-training
Technology

LLM Inference on H800: A Disaggregated Architecture Guide

Explore LLM inference optimization on H800 SuperPods. Learn how a disaggregated architecture with SGLang tackles the prefill bottleneck to boost throughput.

yiakwy

LLM inferencedisaggregated architectureH800 SuperPod
Technology

PyTorch Memory Snapshot: A Guide to GPU Usage Analysis

Monitoring **PyTorch GPU memory usage** during model training can be perplexing. To demystify this, we'll dive into the **PyTorch memory snapshot** tool, a powerful utility for detailed **GPU memory ...

Panda

PyTorch memory snapshotGPU memory analysisPyTorch memory usage
Technology

SFT Flaw: A Learning Rate Tweak Unlocks LLM Potential

Discover a critical flaw in Supervised Fine-Tuning (SFT) that limits LLM performance. Learn how a simple learning rate tweak unifies SFT and DPO for a 25% gain.

Noll

Supervised Fine-Tuning (SFT)Direct Preference Optimization (DPO)LLM fine-tuning
General

GraphRAG's Workflow

Unpack the powerful workflow behind GraphRAG. Learn how it transforms data into a network of nodes and edges, uses intelligent graph traversal for searching, and applies advanced metrics and metadata filters to deliver highly relevant, contextualized answers.

Mi

TechnologyAIInnovation
Technology

First Principles of GPU Performance

This article delves into the core challenges of GPU performance, analyzing the differences between compute-bound and memory-bound operations and highlighting the issue of underutilized memory bandwidth. It further proposes strategies to maximize throughput and looks ahead to the collaborative future of CPUs and GPUs, as well as the evolution of GPU architecture, offering a first-principles perspective on understanding and optimizing GPU performance.

xiaodong gong

TechnologyAIInnovation
Technology

Two Major Challenges in Reinforcement Learning Finally Solved by ICLR Papers

Traditional reinforcement learning models struggle with real-time applications due to "AI lag." Two ICLR 2025 papers from Mila introduce groundbreaking solutions to tackle inaction and delay regret, enabling large AI models to operate in high-frequency, dynamic environments without compromising speed or intelligence.

Noll

TechnologyAIInnovation
Technology

Discuss the infrastructure requirements of Agentic AI.

The rise of Agentic AI places unprecedented demands on our infrastructure. This article explores the emerging software and hardware requirements, from specialized runtimes and memory services to zero-trust security models, dissecting AWS's new Bedrock AgentCore platform and discussing the future of AI infrastructure.

Noll

TechnologyAIInnovation
Technology

From DeepSeek-V3 to Kimi K2:Eight Modern Large Language Model Architecture Designs

This article dissects the architectural evolution of modern large language models in 2025, moving beyond benchmarks to analyze the core design choices of flagship open-source models. We explore key innovations like DeepSeek-V3's Multi-Head Latent Attention (MLA) and Mixture of Experts (MoE), OLMo 2's unique normalization strategies, Gemma 3's use of sliding window attention, and Llama 4's take on MoE. By focusing on these architectural blueprints, we gain a clearer understanding of the engineering priorities shaping the future of LLMs.

Noll

TechnologyAIInnovation
Technology

Deploying Kimi K2: Scalable MoE Model on 128 GPUs

Learn how to deploy Kimi K2, a state-of-the-art Mixture-of-Experts (MoE) model, on a massive 128 H200 GPU cluster. This guide covers the key challenges and solutions using OME and SGLang for scalable, high-performance inference, achieving 4800 tokens/second with low latency.

Noll

Kimi K2 deploymentMixture-of-Experts modelOME
Technology

How to Choose the Right ldmatrix in CUTLASS CuTe

Learn how to select the best ldmatrix operation in CUTLASS CuTe for high-performance GPU matrix multiplication. Optimize data movement and performance.

Noll

CUTLASS CuTeldmatrix operationTiledMMA
Technology

Optimizing TiledCopy for Memory Coalescing on NVIDIA GPUs

Unlock the full potential of your CUDA kernels by mastering memory coalescing with TiledCopy. This article dives deep into optimizing data transfers from Global to Shared Memory on NVIDIA GPUs, covering cp.async, row-major vs. column-major layouts, and cache line alignment to maximize memory bandwidth and accelerate your deep learning workloads.

Noll

TiledCopymemory coalescingcp.async
Technology

Fine-Tune Qwen3 with Unsloth: Fast, Efficient AI Training

# Fine-Tuning Qwen3 with Unsloth: Step-by-Step Guide Qwen3, the latest generation of large language models, is redefining AI with advanced reasoning, instruction following, and robust multilingual s...

Noll

Qwen3 fine-tuningUnslothLoRA
Technology

Baidu ERNIE 4.5: Multimodal Model Training & Fine-Tuning

# Baidu ERNIE 4.5: Advancements in Multimodal Large Language Models Baidu's ERNIE 4.5 marks a major leap in artificial intelligence, especially in the development of **multimodal large language mode...

Noll

ERNIE 4.5multimodal large language modelsBaidu
Technology

MemOS: Persistent Memory for LLMs & Next-Gen AI Agents

# MemOS: Persistent Memory for LLMs & Next-Gen AI Agents ![MemOS hero image showing a brain with digital connections](/images/2025/07-july/2025-07-17-memos-persistent-memory-llms-next-gen-ai_102.jpg...

Noll

MemOSLLM memory managementpersistent memory for LLMs
Technology

Supervised Fine-Tuning (SFT) for LLMs: A Practical Guide

Learn how Supervised Fine-Tuning (SFT) transforms LLMs from base models to chat assistants. Step-by-step guide to SFT workflow, datasets, and best practices.

Noll

Supervised Fine-TuningLLM fine-tuninginstruction-tuned model
Technology

Single vs Multi-Controller in veRL: Pathways to RL

Explore single-controller vs multi-controller in veRL, inspired by Google's Pathways, and learn their impact on distributed reinforcement learning systems.

Noll

single-controllermulti-controllerveRL
Technology

Reinforcement Learning for LLM Reasoning: Trends & Insights

The field of artificial intelligence has seen rapid advancements in reinforcement learning for reasoning, particularly within large language models (LLMs). This article reviews influential research s...

Noll

reinforcement learning for reasoningRL-based reasoning in large language modelsGRPO
Technology

Qwen3 Training Pipeline: Pre-training, RL, and Distillation

## Qwen3 Training Pipeline: Pre-training, Reinforcement Learning, and Model Distillation ### Qwen3 Pre-training: Building a Robust Foundation Qwen3 training begins with a comprehensive three-stage ...

Noll

Qwen3 trainingQwen3 pre-trainingQwen3 reinforcement learning
Technology

LLM API Market 2024: Google Leads as OpenAI Slips

## LLM API Market 2024: Key Trends and Model Leaderboard As we reach the midpoint of 2024, the competitive landscape for large language models (LLMs) is shifting rapidly. The so-called "LLM Wars" ar...

Noll

LLM API marketlarge language modelsOpenRouter
Technology

Training a 671B LLM with Reinforcement Learning: Key...

Discover the technical challenges and solutions in training a 671B parameter LLM with Reinforcement Learning, covering frameworks, memory, and efficiency.

Noll

671B parameter LLMReinforcement Learninglarge language model training
Technology

Bridging Traditional and AI Infrastructure: Key Differences

Discover how traditional infrastructure skills translate to AI infrastructure. Learn key concepts, differences, and engineering fundamentals for LLM systems.

Noll

AI infrastructuretraditional infrastructuredistributed systems
Technology

SGLang Disaggregated LLM Inference: Architecture &...

With its impressive performance and elegant architecture, **SGLang** is rapidly establishing itself in the competitive world of **large language model (LLM) inference**. Could it be the next PyTorch,...

Noll

SGLangLLM inferencedisaggregated inference
Technology

Direct Reinforcement Learning on Base LLMs: The Next Leap

### Why Direct Reinforcement Learning on Base Language Models is the Next Frontier Direct reinforcement learning (RL) on base language models is emerging as a transformative approach in LLM optimiza...

Noll

direct reinforcement learningbase language modelszero-RL
Technology

Reinforcement Learning for LLMs: An Intuitive Guide

Reinforcement learning for LLMs (large language models) is revolutionizing the field of artificial intelligence by enabling models to learn beyond the constraints of supervised learning. This article...

Noll

reinforcement learning for LLMsRL for large language modelssupervised learning
Technology

Decoding Strategies for Large Language Models Explained

# Decoding Strategies for Large Language Models (LLMs) At the core of every large language model (LLM) is a sophisticated process for generating text. Instead of selecting words at random, the model...

Noll

large language model decoding strategiesLLM sampling methodsGreedy Search
Technology

Kimi Researcher: End-to-End RL for Advanced AI Agents

## Kimi Researcher: Advancing AI Agents with End-to-End Reinforcement Learning Kimi Researcher is the flagship product of the Kimi Agent initiative, designed to revolutionize research automation thr...

Noll

Kimi Researcherend-to-end reinforcement learningAI agent
Technology

Qwen3 QK-Norm: Improved On-Device AI Stability

## Qwen3 Model Family: QK-Norm and Enhanced Attention Mechanism The Qwen3 model family, Alibaba's latest large language model release, introduces a significant upgrade for on-device AI: the adoption...

Noll

Qwen3QK-Normattention mechanism