Qwen3-30B-A3B-Base Open-Source Large Language Model - Supports 119 Languages and 32k Long Context Conversations

Qwen3 30B A3B Base

Developed by Qwen

Qwen3-30B-A3B-Base is the latest 30.5B parameter-scale Mixture of Experts (MoE) large language model in the Qwen series, supporting 119 languages and 32k context length.

Large Language Model

Transformers

Open Source License:Apache-2.0 #Mixture of Experts Model #Multilingual Understanding #Long-context Reasoning

Downloads 9,745

Release Time : 4/28/2025

Model Overview

A MoE architecture large language model developed based on a three-stage pre-training framework, focusing on general language modeling, STEM/programming capability enhancement, and long-context understanding.

Model Features

Multilingual Coverage

Pre-training data covers 36 trillion tokens across 119 languages, with language coverage three times that of the previous generation.

MoE Architecture Optimization

Utilizes global batch load balancing loss and qk layer normalization techniques to improve training stability.

Three-stage Pre-training

Phased enhancement of general capabilities, STEM/programming/reasoning abilities, and long-context understanding.

Model Capabilities

Multilingual text generation

Code generation and completion

Complex logical reasoning

Long document understanding

Mathematical problem solving

Use Cases

Development Assistance

Code Auto-completion

Supports code generation and error fixing for multiple programming languages

Education & Research

STEM Problem Solving

Solves complex problems in mathematics, physics, and other disciplines

🚀 Qwen3-30B-A3B-Base

Qwen3-30B-A3B-Base is a powerful large language model from the Qwen series, offering enhanced performance and capabilities.

🚀 Quick Start

The code of Qwen3-MoE has been integrated into the latest Hugging Face transformers. It is recommended to use the latest version of transformers.

With transformers<4.51.0, you will encounter the following error:

KeyError: 'qwen3_moe'

✨ Features

Qwen3 Highlights

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Building upon extensive advancements in training data, model architecture, and optimization techniques, Qwen3 delivers the following key improvements over the previously released Qwen2.5:

Expanded Higher-Quality Pre-training Corpus: Qwen3 is pre-trained on 36 trillion tokens across 119 languages — tripling the language coverage of Qwen2.5 — with a much richer mix of high-quality data, including coding, STEM, reasoning, book, multilingual, and synthetic data.
Training Techniques and Model Architecture: Qwen3 incorporates a series of training techiques and architectural refinements, including global-batch load balancing loss for MoE models and qk layernorm for all models, leading to improved stability and overall performance.
Three-stage Pre-training: Stage 1 focuses on broad language modeling and general knowledge acquisition, Stage 2 improves reasoning skills like STEM, coding, and logical reasoning, and Stage 3 enhances long-context comprehension by extending training sequence lengths up to 32k tokens.
Scaling Law Guided Hyperparameter Tuning: Through comprehensive scaling law studies across the three-stage pre-training pipeline, Qwen3 systematically tunes critical hyperparameters — such as learning rate scheduler and batch size — separately for dense and MoE models, resulting in better training dynamics and final performance across different model scales.

Model Overview

Qwen3-30B-A3B-Base has the following features:

Property	Details
Type	Causal Language Models
Training Stage	Pretraining
Number of Parameters	30.5B in total and 3.3B activated
Number of Paramaters (Non-Embedding)	29.9B
Number of Layers	48
Number of Attention Heads (GQA)	32 for Q and 4 for KV
Number of Experts	128
Number of Activated Experts	8
Context Length	32,768

For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our blog, GitHub, and Documentation.

📚 Documentation

Detailed evaluation results are reported in this 📑 blog.

📄 License

This project is licensed under the Apache-2.0 license.

Citation

If you find our work helpful, feel free to give us a cite.

@misc{qwen3,
    title  = {Qwen3},
    url    = {https://qwenlm.github.io/blog/qwen3/},
    author = {Qwen Team},
    month  = {April},
    year   = {2025}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご