Search Articles

Search results for “Transformer”

Found 7 articles

What Are LLMs? A Guide to Generative AI

2025/8/1By hhmy27in Getting Started

Discover what Large Language Models (LLMs) are and how they power Generative AI. Learn about pre-training, fine-tuning, and the Transformer architecture.

Large Language Models (LLMs)Generative AITransformer architectureprompt engineering

LLM Inference on H800: A Disaggregated Architecture Guide

2025/7/29By yiakwyin Technology

Explore LLM inference optimization on H800 SuperPods. Learn how a disaggregated architecture with SGLang tackles the prefill bottleneck to boost throughput.

LLM inferencedisaggregated architectureH800 SuperPodSGLang

First Principles of GPU Performance

2025/7/25By xiaodong gongin Technology

This article delves into the core challenges of GPU performance, analyzing the differences between compute-bound and memory-bound operations and highlighting the issue of underutilized memory bandwidth. It further proposes strategies to maximize throughput and looks ahead to the collaborative future of CPUs and GPUs, as well as the evolution of GPU architecture, offering a first-principles perspective on understanding and optimizing GPU performance.

TechnologyAIInnovation

From DeepSeek-V3 to Kimi K2：Eight Modern Large Language Model Architecture Designs

2025/7/22By Nollin Technology

This article dissects the architectural evolution of modern large language models in 2025, moving beyond benchmarks to analyze the core design choices of flagship open-source models. We explore key innovations like DeepSeek-V3's Multi-Head Latent Attention (MLA) and Mixture of Experts (MoE), OLMo 2's unique normalization strategies, Gemma 3's use of sliding window attention, and Llama 4's take on MoE. By focusing on these architectural blueprints, we gain a clearer understanding of the engineering priorities shaping the future of LLMs.

TechnologyAIInnovation

Supervised Fine-Tuning (SFT) for LLMs: A Practical Guide

2025/7/16By Nollin Technology

Learn how Supervised Fine-Tuning (SFT) transforms LLMs from base models to chat assistants. Step-by-step guide to SFT workflow, datasets, and best practices.

Supervised Fine-TuningLLM fine-tuninginstruction-tuned modelchat model

How Linear Layers Power Multi-Head Attention in Transformers

2025/7/15By Nollin Technology

Discover how linear layers enable multi-head attention in Transformers, powering advanced NLP models with parallel processing and rich representations.

multi-head attentionlinear layersTransformer architecturequery key value

Training a 671B LLM with Reinforcement Learning: Key...

2025/7/9By Nollin Technology

Discover the technical challenges and solutions in training a 671B parameter LLM with Reinforcement Learning, covering frameworks, memory, and efficiency.

671B parameter LLMReinforcement Learninglarge language model trainingMegatron