Qwen3-8B-Base Open-Source Large Language Model - Supports 119 Languages, Enjoy Free Multilingual Interaction Experience

Qwen3 8B Base

Developed by Qwen

Qwen3 is the latest generation of large language models in the Tongyi Qianwen series, offering a complete dense model and mixture-of-experts (MoE) model system, covering 36 trillion tokens of pre-training data in 119 languages.

Large Language Model

Transformers

Open Source License:Apache-2.0 #Multilingual Large Model #32k Long Context Understanding #Enhanced STEM Reasoning

Downloads 26.79k

Release Time : 4/28/2025

Model Overview

Qwen3-8B-Base is an 8.2 billion parameter causal language model focused on general language modeling and specialized capability enhancement, supporting 32k ultra-long context understanding.

Model Features

Multilingual Coverage

Pre-training data covers 36 trillion tokens across 119 languages, tripling the language coverage of previous generations

Specialized Capability Enhancement

Strengthens specialized capabilities in STEM/programming/logical reasoning through a three-phase pre-training strategy

Long Context Understanding

Supports 32k ultra-long context processing with optimized long-text comprehension

Training Innovation

Employs innovative techniques like MoE global batch load balancing loss function and full-model qk layer normalization

Model Capabilities

Multilingual text generation

Programming code generation

Logical reasoning

Long context understanding

STEM problem solving

Use Cases

Natural Language Processing

Multilingual Text Generation

Generates coherent text content in multiple languages

Supports fluent generation in 119 languages

Technical Document Processing

Parses and understands lengthy technical documents

32k context window enables complete document analysis

Programming Assistance

Code Generation & Completion

Generates programming code based on natural language descriptions

Enhanced programming specialization delivers more accurate code output

🚀 Qwen3-8B-Base

Qwen3-8B-Base is a powerful causal language model from the Qwen series, offering high - performance language processing capabilities with a large context length.

🚀 Quick Start

The code of Qwen3 has been integrated into the latest Hugging Face transformers. It is recommended to use the latest version of transformers.

If you use transformers<4.51.0, you will encounter the following error:

KeyError: 'qwen3'

✨ Features

Qwen3 Highlights

Qwen3 is the latest generation of large language models in the Qwen series, offering a comprehensive suite of dense and mixture - of - experts (MoE) models. Building upon extensive advancements in training data, model architecture, and optimization techniques, Qwen3 delivers the following key improvements over the previously released Qwen2.5:

Expanded Higher - Quality Pre - training Corpus: Qwen3 is pre - trained on 36 trillion tokens across 119 languages — tripling the language coverage of Qwen2.5 — with a much richer mix of high - quality data, including coding, STEM, reasoning, book, multilingual, and synthetic data.
Training Techniques and Model Architecture: Qwen3 incorporates a series of training techniques and architectural refinements, including global - batch load balancing loss for MoE models and qk layernorm for all models, leading to improved stability and overall performance.
Three - stage Pre - training: Stage 1 focuses on broad language modeling and general knowledge acquisition, Stage 2 improves reasoning skills like STEM, coding, and logical reasoning, and Stage 3 enhances long - context comprehension by extending training sequence lengths up to 32k tokens.
Scaling Law Guided Hyperparameter Tuning: Through comprehensive scaling law studies across the three - stage pre - training pipeline, Qwen3 systematically tunes critical hyperparameters — such as learning rate scheduler and batch size — separately for dense and MoE models, resulting in better training dynamics and final performance across different model scales.

Model Overview

Qwen3 - 8B - Base has the following features:

Property	Details
Model Type	Causal Language Models
Training Stage	Pretraining
Number of Parameters	8.2B
Number of Parameters (Non - Embedding)	6.95B
Number of Layers	36
Number of Attention Heads (GQA)	32 for Q and 8 for KV
Context Length	32,768

For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our blog, GitHub, and Documentation.

📚 Documentation

Detailed evaluation results are reported in this 📑 blog.

Citation

If you find our work helpful, feel free to give us a cite.

@misc{qwen3,
    title  = {Qwen3},
    url    = {https://qwenlm.github.io/blog/qwen3/},
    author = {Qwen Team},
    month  = {April},
    year   = {2025}
}

📄 License

This project is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご