Qwen3-4B-Base Open-Source Large Language Model - Multilingual Dialogue Support with 32k Context Length

Qwen3 4B Base

Developed by Qwen

Qwen3-4B-Base is the latest generation of the Qwen series' 4-billion-parameter large language model, pre-trained on 36 trillion tokens of multilingual data, supporting a 32k context length.

Large Language Model

Transformers

Open Source License:Apache-2.0 #Multilingual support #Long-text understanding #STEM reasoning

Downloads 50.84k

Release Time : 4/28/2025

Model Overview

Qwen3-4B-Base is a causal language model focused on general language understanding and generation tasks, suitable for various scenarios such as text generation and code completion.

Model Features

Large-scale multilingual pre-training

Pre-trained on 36 trillion tokens of data covering 119 languages, with language coverage three times that of the previous generation.

Three-stage training optimization

Adopts a three-stage pre-training paradigm: general language modeling → specialized capability enhancement → long-context training.

Long-context support

Supports processing ultra-long contexts of up to 32k tokens.

Efficient attention mechanism

Utilizes Grouped Query Attention (GQA) architecture with 32 query heads and 8 key-value heads.

Model Capabilities

Text generation

Multilingual understanding

Code completion

Logical reasoning

Long-text processing

Use Cases

Natural Language Processing

Multilingual text generation

Generates coherent text content in multiple languages.

Supports fluent generation in 119 languages.

Technical document processing

Handles technical documents and code in STEM fields.

Optimized for code and STEM-related data.

Development Assistance

Code completion

Assists programmers in writing and completing code.

Increased proportion of code-related data in pre-training.

🚀 Qwen3-4B-Base

Qwen3-4B-Base is a powerful causal language model in the Qwen series, offering high - performance language processing capabilities with a large number of parameters and long context length.

✨ Features

Qwen3 Highlights

Qwen3 is the latest generation of large language models in the Qwen series, providing a comprehensive set of dense and mixture - of - experts (MoE) models. Based on extensive improvements in training data, model architecture, and optimization techniques, Qwen3 offers the following key enhancements compared to the previously released Qwen2.5:

Expanded Higher - Quality Pre - training Corpus: Qwen3 is pre - trained on 36 trillion tokens in 119 languages, tripling the language coverage of Qwen2.5. It has a much richer combination of high - quality data, including coding, STEM, reasoning, book, multilingual, and synthetic data.
Training Techniques and Model Architecture: Qwen3 incorporates a series of training techniques and architectural refinements, such as the global - batch load balancing loss for MoE models and qk layernorm for all models, which improves stability and overall performance.
Three - stage Pre - training: Stage 1 focuses on broad language modeling and general knowledge acquisition. Stage 2 enhances reasoning skills in areas like STEM, coding, and logical reasoning. Stage 3 improves long - context comprehension by extending the training sequence length up to 32k tokens.
Scaling Law Guided Hyperparameter Tuning: Through comprehensive scaling law studies across the three - stage pre - training pipeline, Qwen3 systematically tunes critical hyperparameters (such as the learning rate scheduler and batch size) separately for dense and MoE models, resulting in better training dynamics and final performance across different model scales.

Model Overview

Qwen3 - 4B - Base has the following features:

Property	Details
Model Type	Causal Language Models
Training Stage	Pretraining
Number of Parameters	4.0B
Number of Parameters (Non - Embedding)	3.6B
Number of Layers	36
Number of Attention Heads (GQA)	32 for Q and 8 for KV
Context Length	32,768

For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our blog, GitHub, and Documentation.

📦 Installation

The code of Qwen3 has been included in the latest Hugging Face transformers, and we recommend using the latest version of transformers.

⚠️ Important Note

With transformers<4.51.0, you will encounter the following error:

KeyError: 'qwen3'

📚 Documentation

Evaluation & Performance

Detailed evaluation results are reported in this 📑 blog.

Citation

If you find our work helpful, feel free to cite us.

@misc{qwen3technicalreport,
      title={Qwen3 Technical Report}, 
      author={Qwen Team},
      year={2025},
      eprint={2505.09388},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.09388}, 
}

📄 License

This project is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご