Technology

Fine-Tune Qwen3 with Unsloth: Fast, Efficient AI Training

# Fine-Tuning Qwen3 with Unsloth: Step-by-Step Guide Qwen3, the latest generation of large language models, is redefining AI with advanced reasoning, instruction following, and robust multilingual s...
Noll
3 min read
#Qwen3 fine-tuning#Unsloth#LoRA#multilingual support

Fine-Tuning Qwen3 with Unsloth: Step-by-Step Guide

Qwen3, the latest generation of large language models, is redefining AI with advanced reasoning, instruction following, and robust multilingual support. With Unsloth, developers can fine-tune Qwen3 models at unprecedented speed and efficiency—even on limited hardware.

Why Fine-Tune Qwen3 with Unsloth?

The latest Qwen3 models set new benchmarks in reasoning, instruction following, and multilingual capabilities. Unsloth enables you to fine-tune these models at twice the speed and with 70% less memory, supporting an 8x longer context length compared to traditional methods.

Balancing Reasoning and Conversation in Qwen3 Fine-Tuning

Qwen3 operates in both reasoning and conversational modes. The challenge: fine-tune for conversation without sacrificing reasoning skills. Training exclusively on conversational data can diminish Qwen3's logical performance.

Optimal Data Mix:

  • 75% reasoning data
  • 25% conversational data

This ratio preserves analytical performance while enhancing conversational fluency. Adjust as needed for your application.

1. Install Dependencies for LoRA Fine-Tuning

We'll use Low-Rank Adaptation (LoRA) for efficient Qwen3 fine-tuning. LoRA introduces trainable low-rank matrices into each model layer, updating only a small subset of parameters (1-10%). This reduces memory and computation requirements while maintaining performance.

2. Prepare Your Qwen3 Training Data

To leverage Qwen3's dual-mode capability, blend two datasets:

Reasoning Dataset

  • Open Math Reasoning dataset (AIMO Progress Prize 2 winner)
  • 10% sample of DeepSeek R1's verifiable reasoning trajectories (95%+ accuracy)

Conversational Dataset

  • FineTome-100k by Maxime Labonne (ShareGPT format)
  • Convert to Hugging Face's multi-turn conversation format

Target dataset structure:

Target dataset structure for Qwen3 fine-tuning

Processing reasoning data:

Processing reasoning data for Qwen3 fine-tuning

Standardizing chat data with Unsloth:

Standardizing chat data into conversation format using Unsloth

Dataset balance check:

  • 19,252 reasoning examples
  • 100,000 chat examples

To prevent reasoning skills from being diluted, mix datasets at a 75:25 reasoning-to-chat ratio.

3. Train Qwen3 Using SFTTrainer

Use Hugging Face's TRL library and the SFTTrainer for training. For demonstration, run for 60 steps. For full training, set num_train_epochs=1 and max_steps=None to train on the entire dataset.

4. Run Inference with Your Fine-Tuned Qwen3 Model

Unsloth's inference API makes testing easy. Adjust parameters for different modes:

  • Reasoning Mode: temperature=0.6, top_p=0.95, top_k=20
  • Conversational Mode: temperature=0.7, top_p=0.8, top_k=20

To activate reasoning, set enable_thinking=True. The model's step-by-step reasoning will be more explicit in this mode.

Saving Your Fine-Tuned Qwen3 Model

Save Only LoRA Adapters

This saves just the adapter weights (not the full base model). For a complete, standalone model, see the next steps.

Merge and Save in float16 Format

Merge LoRA adapters with the base model and save in float16. Backup of adapters is also saved. Toggle code blocks by changing if False to if True.

Merge and Save in GGUF Format

For compatibility with tools like llama.cpp, Jan, and Open WebUI, save in GGUF format. Unsloth supports quantization methods like q8_0 and q4_k_m.

You can now load quantized model files (e.g., model-unsloth.gguf, model-unsloth-Q4_K_M.gguf) into any GGUF-compatible application.

Next Steps: Customizing Qwen3 for Your Needs

You've successfully fine-tuned a Qwen3 model with Unsloth. For best results, continue testing and iterating with your own domain-specific datasets and evaluation metrics. Explore further by integrating custom data and leveraging the Hugging Face and Unsloth ecosystems.


Related Reading:

Related Articles

Technology
6 min

SFT Flaw: A Learning Rate Tweak Unlocks LLM Potential

Discover a critical flaw in Supervised Fine-Tuning (SFT) that limits LLM performance. Learn how a simple learning rate tweak unifies SFT and DPO for a 25% gain.

Noll
Supervised Fine-Tuning (SFT)Direct Preference Optimization (DPO)+2 more
Technology
7 min

Two Major Challenges in Reinforcement Learning Finally Solved by ICLR Papers

Traditional reinforcement learning models struggle with real-time applications due to "AI lag." Two ICLR 2025 papers from Mila introduce groundbreaking solutions to tackle inaction and delay regret, enabling large AI models to operate in high-frequency, dynamic environments without compromising speed or intelligence.

Noll
TechnologyAI+1 more
Technology
13 min

Discuss the infrastructure requirements of Agentic AI.

The rise of Agentic AI places unprecedented demands on our infrastructure. This article explores the emerging software and hardware requirements, from specialized runtimes and memory services to zero-trust security models, dissecting AWS's new Bedrock AgentCore platform and discussing the future of AI infrastructure.

Noll
TechnologyAI+1 more

About This Article

Topic: Technology
Difficulty: Intermediate
Reading Time: 3 minutes
Last Updated: July 19, 2025

This article is part of our comprehensive guide to Large Language Models and AI technologies. Stay updated with the latest developments in the AI field.

All Articles
Share this article to spread LLM knowledge