Qwen3-30B-A3B-Base Open-Source Large Language Model - Free to Power Powerful Language Interaction and Processing

Qwen3 30B A3B Base

Developed by unsloth

Qwen3-30B-A3B-Base is the latest generation of large language models in the Qwen series, with many improvements in training data, model architecture, and optimization techniques, providing more powerful language processing capabilities.

Large Language Model

Transformers

Open Source License:Apache-2.0 #Long context understanding #Multilingual mixing expert #Three-stage pre-training

Downloads 1,822

Release Time : 4/28/2025

Model Overview

Qwen3-30B-A3B-Base is a causal language model based on the Mixture of Experts (MoE) architecture, suitable for various natural language processing scenarios.

Model Features

Expanded high-quality pre-training corpus

Pre-trained on 36 trillion tokens in 119 languages, with a language coverage three times that of Qwen2.5, containing more abundant high-quality data.

Improvements in training technology and model architecture

Adopts global batch load balancing loss and qk layer normalization to improve stability and overall performance.

Three-stage pre-training

The first stage focuses on language modeling and general knowledge acquisition; the second stage improves reasoning ability; the third stage enhances long context understanding ability.

Hyperparameter adjustment based on scaling laws

Conducts a comprehensive scaling law study on the three-stage pre-training process, systematically adjusts key hyperparameters to achieve better training dynamics and final performance.

Model Capabilities

Text generation

Language understanding

Logical reasoning

Multilingual processing

Long context understanding

Use Cases

Natural language processing

Text generation

Generate high-quality and coherent text content.

Logical reasoning

Solve complex logical reasoning problems, such as STEM and coding problems.

Multilingual processing

Process text content in multiple languages.

🚀 Qwen3-30B-A3B-Base

Qwen3-30B-A3B-Base is a powerful causal language model in the Qwen series, offering high - quality pre - training and excellent performance.

🚀 Quick Start

The code of Qwen3 - MoE has been integrated into the latest Hugging Face transformers. We recommend using the latest version of transformers.

With transformers<4.51.0, you will encounter the following error:

KeyError: 'qwen3_moe'

✨ Features

Qwen3 Highlights

Qwen3 is the latest generation of large language models in the Qwen series, providing a comprehensive set of dense and mixture - of - experts (MoE) models. Based on extensive improvements in training data, model architecture, and optimization techniques, Qwen3 offers the following key enhancements compared to the previously released Qwen2.5:

Expanded Higher - Quality Pre - training Corpus: Qwen3 is pre - trained on 36 trillion tokens in 119 languages, tripling the language coverage of Qwen2.5. It uses a much richer mix of high - quality data, including coding, STEM, reasoning, book, multilingual, and synthetic data.
Training Techniques and Model Architecture: Qwen3 incorporates a series of training techniques and architectural refinements, such as global - batch load balancing loss for MoE models and qk layernorm for all models, which improve stability and overall performance.
Three - stage Pre - training: Stage 1 focuses on broad language modeling and general knowledge acquisition. Stage 2 enhances reasoning skills in areas like STEM, coding, and logical reasoning. Stage 3 improves long - context comprehension by extending the training sequence length up to 32k tokens.
Scaling Law Guided Hyperparameter Tuning: Through comprehensive scaling law studies across the three - stage pre - training pipeline, Qwen3 systematically tunes critical hyperparameters, such as the learning rate scheduler and batch size, for dense and MoE models separately. This results in better training dynamics and final performance across different model scales.

Model Overview

Qwen3 - 30B - A3B - Base has the following features:

Property	Details
Model Type	Causal Language Models
Training Stage	Pretraining
Total Number of Parameters	30.5B in total and 3.3B activated
Number of Parameters (Non - Embedding)	29.9B
Number of Layers	48
Number of Attention Heads (GQA)	32 for Q and 4 for KV
Number of Experts	128
Number of Activated Experts	8
Context Length	32,768

For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our blog, GitHub, and Documentation.

📚 Documentation

Evaluation & Performance

Detailed evaluation results are reported in this 📑 blog.

Citation

If you find our work helpful, feel free to give us a cite.

@misc{qwen3,
    title  = {Qwen3},
    url    = {https://qwenlm.github.io/blog/qwen3/},
    author = {Qwen Team},
    month  = {April},
    year   = {2025}
}

📄 License

This project is licensed under the apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご