What is LLaMA Factory and why use it?

LLaMA Factory is a no-code toolkit for fine-tuning 100+ LLMs (LLaMA 3, Mistral, Qwen, Yi, Gemma). It features a WebUI interface requiring zero coding, supports efficient methods like LoRA/QLoRA to reduce GPU memory by 80%, and includes built-in optimizations like FlashAttention-2. Ideal for researchers and developers who want to customize LLMs on local GPUs without writing training scripts.

What GPU do I need to run LLaMA Factory?

Minimum: NVIDIA GPU with 8GB VRAM (RTX 3060) for QLoRA fine-tuning of 7B models. Recommended: RTX 3090/4090 (24GB VRAM) or A100 for full-parameter training. LLaMA Factory supports 2/3/4/5/6/8-bit quantization to reduce memory usage. CUDA 12.1+ required. For CPU-only systems, training is extremely slow and not practical.

How do I install CUDA for LLaMA Factory?

1) Check GPU compatibility at nvidia.com/cuda-gpus, 2) Download CUDA Toolkit 12.1 from nvidia.com/cuda-toolkit-archive (matches PyTorch dependency), 3) Run installer and uncheck Driver if already installed, 4) Verify installation with 'nvcc --version'. Linux users need GCC compiler. Uninstall old CUDA versions first to avoid conflicts. Full step-by-step guide in article.

What LLM models does LLaMA Factory support?

LLaMA Factory supports 100+ models including: LLaMA 2/3 (7B-70B), Mistral/Mixtral-MoE, Qwen 1.5/2/2.5, Yi, Gemma, Baichuan, ChatGLM, DeepSeek. Also supports vision models like LLaVA. You can fine-tune with pre-training, instruction tuning, reward modeling, PPO, DPO, and ORPO methods. All models can use LoRA/QLoRA for efficient training on consumer GPUs.

Can I fine-tune LLaMA 3 without coding experience?

Yes! LLaMA Factory's WebUI (LlamaBoard) requires zero coding. Steps: 1) Install CUDA and Python dependencies, 2) Launch WebUI with 'llamafactory-cli webui', 3) Select model and dataset from dropdown menus, 4) Configure training parameters (LoRA rank, learning rate) through web interface, 5) Click 'Start' to begin training, 6) Export fine-tuned model. No Python/PyTorch knowledge needed for basic fine-tuning.

Fine-Tune LLaMA 3 in 3 Steps: No-Code LLM Training (2025 LLaMA Factory)

LLaMA Factory Guide Cover Image

Fine-tuning Large Language Models (LLMs) is the key to creating powerful, custom AI solutions. However, this process has traditionally been a complex, code-intensive task reserved for machine learning experts. LLaMA Factory is a powerful, user-friendly LLM fine-tuning toolkit that changes the game. It simplifies the entire process, empowering researchers and developers to customize hundreds of pre-trained models on their local machines—often without writing a single line of code.

This guide provides a complete walkthrough of how to use LLaMA Factory, from installation to exporting your first custom LLM.

What is LLaMA Factory?

LLaMA Factory is a comprehensive toolkit that offers:

Broad Model Support: Fine-tune LLaMA, LLaVA, Mistral, Mixtral-MoE, Qwen, Yi, Gemma, and many more.
Versatile Training Methods: Supports Pre-training, Instruction Fine-Tuning, Reward Model Training, PPO, DPO, and ORPO.
Efficient Computation: Full-parameter, frozen, LoRA, and QLoRA (2, 3, 4, 5, 6, 8-bit) fine-tuning.
Advanced Optimization: Integrates GaLore, BAdam, DoRA, LongLoRA, and other state-of-the-art algorithms.
Built-in Acceleration: Out-of-the-box support for FlashAttention-2 and Unsloth to speed up LLM training.
Flexible Inference: Use Transformers or vLLM for model inference.
Experiment Monitoring: Seamlessly track your training with LlamaBoard, TensorBoard, Wandb, and MLflow.

Step 1: Install CUDA for LLM Training

Before you can fine-tune an LLM with LLaMA Factory, you must install the NVIDIA CUDA Toolkit. This platform gives the toolkit direct access to your GPU's processing power, which is essential for training.

CUDA Installation on Linux

Check Compatibility: First, confirm your GPU is CUDA-capable at https://developer.nvidia.com/cuda-gpus. Then, verify your Linux version is supported by running:
```
uname -m && cat /etc/*release
```
Verify GCC: Ensure you have the GCC compiler by running:
```
gcc --version
```
Download CUDA Toolkit: We recommend CUDA 12.1 to align with the project's PyTorch dependency. Download the correct installer for your system from the CUDA Toolkit Archive: https://developer.nvidia.com/cuda-toolkit-archive.

Screenshot of CUDA Toolkit download page

Screenshot of running CUDA installer on Linux

Screenshot of CUDA installer options showing unchecked Driver selection

If you have an older CUDA version, uninstall it first. Try the official uninstaller:

sudo /usr/local/cuda-12.0/bin/cuda-uninstaller

If that fails, remove it manually:

sudo rm -rf /usr/local/cuda-12.0

Run the Installer: Execute the downloaded file and follow the prompts. For CUDA 12.1.1, the commands are:

wget https://developer.download.nvidia.com/compute/cuda/12.1.1/local_installers/cuda_12.1.1_530.30.02_linux.run
sudo sh cuda_12.1.1_530.30.02_linux.run

Important: During installation, uncheck the "Driver" option unless you are certain it's compatible with your GPU to avoid common issues.

Terminal output of nvcc -V showing CUDA version

Verify Installation: Once finished, run:
```
nvcc -V
```
A successful installation will display the CUDA version.

Screenshot of CUDA version confirmation in terminal

CUDA Installation on Windows

Check OS Version: Go to Settings > About and check your Windows version against the supported list (e.g., Windows 11 23H2, Windows 10 22H2).
Download and Install: Select the corresponding CUDA 12.1 version from the NVIDIA developer website, download it, and run the installer.
Verify Installation: Open Command Prompt (cmd) and run:
```
nvcc -V
```
If the command returns the version number, your CUDA installation is complete. If not, you may need to add the CUDA path to your system's environment variables.

Step 2: Install LLaMA Factory

With CUDA ready, you can now install the LLaMA Factory toolkit. Ensure you have the following prerequisites:

Python 3.10
PyTorch 2.1.2
CUDA 12.1

Run these commands to clone the repository and install dependencies:

git clone https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e .[torch,deepspeed]

To verify the installation, run:

llamafactory-cli version

Terminal showing successful llamafactory-cli version output

Screenshot of LLaMA Factory WebUI main interface

Enabling QLoRA for Efficient Training

To use Quantized LoRA (QLoRA) on Windows for memory-efficient fine-tuning, install a compatible bitsandbytes wheel:

pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.41.2.post2-py3-none-win_amd64.whl

Activating FlashAttention-2 for Speed

To accelerate training with FlashAttention-2 on Windows, install the appropriate flash-attention wheel for your CUDA version.

Installing Optional Dependencies

LLaMA Factory is modular. Install only what you need for your specific LLM fine-tuning task:

# For Intel Arc GPUs
pip install -e .[intel]

# For Ascend NPU devices
pip install -e .[ascend]

# For exporting to GGUF format
pip install -e .[gguf]

# For exporting to AWQ format
pip install -e .[awq]

# For all dependencies
pip install -e .[all]

Step 3: Fine-Tune Your First LLM with the WebUI

Now for the exciting part: using the LLaMA Factory WebUI for code-free fine-tuning. Launch it with this command:

llamafactory-cli webui

The WebUI is organized into four tabs: Train, Evaluate & Predict, Chat, and Export.

How to Start Training a Model

To start a fine-tuning job, follow these steps in the Train tab:

Select Model: Choose a base model name and path.
Choose Method: Select the training stage (e.g., SFT) and fine-tuning method (e.g., LoRA).
Load Dataset: Pick one or more training datasets.
Set Hyperparameters: Configure the learning rate, number of epochs, etc.
Configure Parameters: Adjust fine-tuning settings as needed.
Define Output: Specify the output directory.
Begin Training: Click Start to launch the process.

Note on Custom Datasets: To use your own data for a custom LLM, add its description to the data/data_info.json file and ensure your dataset file follows the expected format.

How to Evaluate and Predict with Your Model

Once your model is trained, go to the Evaluate & Predict tab. Here, you can benchmark its performance on a dataset by providing the base Model and your trained Adapter.

How to Chat with Your Fine-Tuned LLM

The Chat tab lets you interact with your custom LLM directly. Load your Model and Adapter, choose an Inference engine, and start a conversation to test its capabilities.

How to Export Your Custom LLM for Deployment

When you are satisfied with your model, use the Export tab to package it for deployment. Set the base Model, your Adapter, Max shard size, and any quantization settings, then click Export.

Why Use LLaMA Factory for LLM Fine-Tuning?

LLaMA Factory demystifies the art of fine-tuning, packaging a suite of sophisticated tools into an accessible web interface. By handling the complexities of training, evaluation, and deployment, it allows you to focus on what truly matters: experimenting with and building powerful, customized AI models. Whether you're a seasoned researcher or just starting your journey, LLaMA Factory provides a robust and efficient platform to fine-tune any LLM and bring your ideas to life.

Reinforcement Learning Hub