Latest Articles

Dive deep into the world of Artificial Intelligence with our curated collection of articles, covering the latest breakthroughs and insights from leading researchers and engineers.

August 20, 2025Technology

AI Infrastructure: The Real Engine Behind AI Agents

Struggling with AI projects? The problem isn't your models, it's your AI infrastructure. Learn why data silos & lag hold you back and how to build a better f...

Pingxingjilu

AI infrastructureAI agentsdata-to-AI pipeline

August 6, 2025Technology

LLaMA Factory: Easy LLM Fine-Tuning Guide

Learn to install and use LLaMA Factory to fine-tune hundreds of LLMs on your local machine. This guide covers CUDA setup, installation, and WebUI usage.

Number in the Moutain

LLaMA FactoryLLM fine-tuningfine-tune LLM

August 4, 2025Foundational Concepts

What Is a Transformer Model? An In-Depth Guide

A deep dive into the Transformer architecture, the engine behind modern LLMs. Understand self-attention, encoders, decoders, and how they work together.

Alex Carter

Transformer modelTransformer architectureTransformers AI

August 2, 2025Tools & Frameworks

Ollama Guide: Run LLMs Locally on Your Machine

A comprehensive guide to using Ollama for running large language models like Llama 3 and Mistral on your local machine. Learn installation, commands, and how to create custom models.

Jordan Lee

Ollamarun LLMs locallyOllama guide

August 1, 2025Getting Started

What Are LLMs? A Guide to Generative AI

Discover what Large Language Models (LLMs) are and how they power Generative AI. Learn about pre-training, fine-tuning, and the Transformer architecture.

hhmy27

Large Language Models (LLMs)Generative AITransformer architecture

July 30, 2025Technology

Separated Architectures for LLM RL Post-Training

Explore the shift to separated architectures for RL post-training of LLMs. Learn how systems like AsyncFlow & TransferQueue solve data orchestration challenges.

Little Boji

RL post-trainingseparated architectureLLM post-training

July 29, 2025Technology

LLM Inference on H800: A Disaggregated Architecture Guide

Explore LLM inference optimization on H800 SuperPods. Learn how a disaggregated architecture with SGLang tackles the prefill bottleneck to boost throughput.

yiakwy

LLM inferencedisaggregated architectureH800 SuperPod

July 28, 2025Technology

PyTorch Memory Snapshot: A Guide to GPU Usage Analysis

Monitoring **PyTorch GPU memory usage** during model training can be perplexing. To demystify this, we'll dive into the **PyTorch memory snapshot** tool, a powerful utility for detailed **GPU memory ...

Panda

PyTorch memory snapshotGPU memory analysisPyTorch memory usage

July 27, 2025Technology

SFT Flaw: A Learning Rate Tweak Unlocks LLM Potential

Discover a critical flaw in Supervised Fine-Tuning (SFT) that limits LLM performance. Learn how a simple learning rate tweak unifies SFT and DPO for a 25% gain.

Noll

Supervised Fine-Tuning (SFT)Direct Preference Optimization (DPO)LLM fine-tuning

July 26, 2025General

GraphRAG's Workflow

Unpack the powerful workflow behind GraphRAG. Learn how it transforms data into a network of nodes and edges, uses intelligent graph traversal for searching, and applies advanced metrics and metadata filters to deliver highly relevant, contextualized answers.

TechnologyAIInnovation

July 25, 2025Technology

First Principles of GPU Performance

This article delves into the core challenges of GPU performance, analyzing the differences between compute-bound and memory-bound operations and highlighting the issue of underutilized memory bandwidth. It further proposes strategies to maximize throughput and looks ahead to the collaborative future of CPUs and GPUs, as well as the evolution of GPU architecture, offering a first-principles perspective on understanding and optimizing GPU performance.

xiaodong gong

TechnologyAIInnovation

July 24, 2025Technology

Two Major Challenges in Reinforcement Learning Finally Solved by ICLR Papers

Traditional reinforcement learning models struggle with real-time applications due to "AI lag." Two ICLR 2025 papers from Mila introduce groundbreaking solutions to tackle inaction and delay regret, enabling large AI models to operate in high-frequency, dynamic environments without compromising speed or intelligence.

Noll

TechnologyAIInnovation

July 23, 2025Technology

Discuss the infrastructure requirements of Agentic AI.

The rise of Agentic AI places unprecedented demands on our infrastructure. This article explores the emerging software and hardware requirements, from specialized runtimes and memory services to zero-trust security models, dissecting AWS's new Bedrock AgentCore platform and discussing the future of AI infrastructure.

Noll

TechnologyAIInnovation

July 22, 2025Technology

From DeepSeek-V3 to Kimi K2：Eight Modern Large Language Model Architecture Designs

This article dissects the architectural evolution of modern large language models in 2025, moving beyond benchmarks to analyze the core design choices of flagship open-source models. We explore key innovations like DeepSeek-V3's Multi-Head Latent Attention (MLA) and Mixture of Experts (MoE), OLMo 2's unique normalization strategies, Gemma 3's use of sliding window attention, and Llama 4's take on MoE. By focusing on these architectural blueprints, we gain a clearer understanding of the engineering priorities shaping the future of LLMs.

Noll

TechnologyAIInnovation

July 21, 2025Technology

Deploying Kimi K2: Scalable MoE Model on 128 GPUs

Learn how to deploy Kimi K2, a state-of-the-art Mixture-of-Experts (MoE) model, on a massive 128 H200 GPU cluster. This guide covers the key challenges and solutions using OME and SGLang for scalable, high-performance inference, achieving 4800 tokens/second with low latency.

Noll

Kimi K2 deploymentMixture-of-Experts modelOME

July 20, 2025Technology

How to Choose the Right ldmatrix in CUTLASS CuTe

Learn how to select the best ldmatrix operation in CUTLASS CuTe for high-performance GPU matrix multiplication. Optimize data movement and performance.

Noll

CUTLASS CuTeldmatrix operationTiledMMA

July 20, 2025Technology

Optimizing TiledCopy for Memory Coalescing on NVIDIA GPUs

Unlock the full potential of your CUDA kernels by mastering memory coalescing with TiledCopy. This article dives deep into optimizing data transfers from Global to Shared Memory on NVIDIA GPUs, covering cp.async, row-major vs. column-major layouts, and cache line alignment to maximize memory bandwidth and accelerate your deep learning workloads.

Noll

TiledCopymemory coalescingcp.async

July 19, 2025Technology

Fine-Tune Qwen3 with Unsloth: Fast, Efficient AI Training

# Fine-Tuning Qwen3 with Unsloth: Step-by-Step Guide Qwen3, the latest generation of large language models, is redefining AI with advanced reasoning, instruction following, and robust multilingual s...

Noll

Qwen3 fine-tuningUnslothLoRA

July 18, 2025Technology

Baidu ERNIE 4.5: Multimodal Model Training & Fine-Tuning

# Baidu ERNIE 4.5: Advancements in Multimodal Large Language Models Baidu's ERNIE 4.5 marks a major leap in artificial intelligence, especially in the development of **multimodal large language mode...

Noll

ERNIE 4.5multimodal large language modelsBaidu

July 17, 2025Technology

MemOS: Persistent Memory for LLMs & Next-Gen AI Agents

# MemOS: Persistent Memory for LLMs & Next-Gen AI Agents ![MemOS hero image showing a brain with digital connections](/images/2025/07-july/2025-07-17-memos-persistent-memory-llms-next-gen-ai_102.jpg...

Noll

MemOSLLM memory managementpersistent memory for LLMs

July 16, 2025Technology

Supervised Fine-Tuning (SFT) for LLMs: A Practical Guide

Learn how Supervised Fine-Tuning (SFT) transforms LLMs from base models to chat assistants. Step-by-step guide to SFT workflow, datasets, and best practices.

Noll

Supervised Fine-TuningLLM fine-tuninginstruction-tuned model

July 15, 2025Technology

How Linear Layers Power Multi-Head Attention in Transformers

Discover how linear layers enable multi-head attention in Transformers, powering advanced NLP models with parallel processing and rich representations.

Noll

multi-head attentionlinear layersTransformer architecture

July 14, 2025Technology

Single vs Multi-Controller in veRL: Pathways to RL

Explore single-controller vs multi-controller in veRL, inspired by Google's Pathways, and learn their impact on distributed reinforcement learning systems.

Noll

single-controllermulti-controllerveRL

July 13, 2025Technology

Reinforcement Learning for LLM Reasoning: Trends & Insights

The field of artificial intelligence has seen rapid advancements in reinforcement learning for reasoning, particularly within large language models (LLMs). This article reviews influential research s...

Noll

reinforcement learning for reasoningRL-based reasoning in large language modelsGRPO

July 10, 2025Technology

Grok 4: xAI's Breakthrough AI Model Surpasses Benchmarks

Discover how xAI's Grok 4 sets new AI benchmarks, outperforms rivals, and introduces multi-agent systems in the race for next-gen artificial intelligence.

Noll

Grok 4xAIlarge language model

July 10, 2025Technology

Qwen3 Training Pipeline: Pre-training, RL, and Distillation

## Qwen3 Training Pipeline: Pre-training, Reinforcement Learning, and Model Distillation ### Qwen3 Pre-training: Building a Robust Foundation Qwen3 training begins with a comprehensive three-stage ...

Noll

Qwen3 trainingQwen3 pre-trainingQwen3 reinforcement learning

July 9, 2025Technology

LLM API Market 2024: Google Leads as OpenAI Slips

## LLM API Market 2024: Key Trends and Model Leaderboard As we reach the midpoint of 2024, the competitive landscape for large language models (LLMs) is shifting rapidly. The so-called "LLM Wars" ar...

Noll

LLM API marketlarge language modelsOpenRouter

July 9, 2025Technology

Training a 671B LLM with Reinforcement Learning: Key...

Discover the technical challenges and solutions in training a 671B parameter LLM with Reinforcement Learning, covering frameworks, memory, and efficiency.

Noll

671B parameter LLMReinforcement Learninglarge language model training

July 8, 2025Technology

Bridging Traditional and AI Infrastructure: Key Differences

Discover how traditional infrastructure skills translate to AI infrastructure. Learn key concepts, differences, and engineering fundamentals for LLM systems.

Noll

AI infrastructuretraditional infrastructuredistributed systems

July 7, 2025Technology

SGLang Disaggregated LLM Inference: Architecture &...

With its impressive performance and elegant architecture, **SGLang** is rapidly establishing itself in the competitive world of **large language model (LLM) inference**. Could it be the next PyTorch,...

Noll

SGLangLLM inferencedisaggregated inference

July 6, 2025Technology

Direct Reinforcement Learning on Base LLMs: The Next Leap

### Why Direct Reinforcement Learning on Base Language Models is the Next Frontier Direct reinforcement learning (RL) on base language models is emerging as a transformative approach in LLM optimiza...

Noll

direct reinforcement learningbase language modelszero-RL

July 5, 2025Technology

How Andrew Ng Scopes Down AI Projects for Fast Progress

Learn Andrew Ng

Noll

AI project managementscoping down projectsdeveloper productivity

July 3, 2025Technology

Reinforcement Learning for LLMs: An Intuitive Guide

Reinforcement learning for LLMs (large language models) is revolutionizing the field of artificial intelligence by enabling models to learn beyond the constraints of supervised learning. This article...

Noll

reinforcement learning for LLMsRL for large language modelssupervised learning

July 2, 2025Technology

Decoding Strategies for Large Language Models Explained

# Decoding Strategies for Large Language Models (LLMs) At the core of every large language model (LLM) is a sophisticated process for generating text. Instead of selecting words at random, the model...

Noll

large language model decoding strategiesLLM sampling methodsGreedy Search

June 30, 2025Technology

Kimi Researcher: End-to-End RL for Advanced AI Agents

## Kimi Researcher: Advancing AI Agents with End-to-End Reinforcement Learning Kimi Researcher is the flagship product of the Kimi Agent initiative, designed to revolutionize research automation thr...

Noll

Kimi Researcherend-to-end reinforcement learningAI agent

June 26, 2025Technology

Qwen3 QK-Norm: Improved On-Device AI Stability

## Qwen3 Model Family: QK-Norm and Enhanced Attention Mechanism The Qwen3 model family, Alibaba's latest large language model release, introduces a significant upgrade for on-device AI: the adoption...

Noll

Qwen3QK-Normattention mechanism