Qwen3-0.6B-Base Open-Source Large Language Model - Supports long-context conversations in 119 languages, free deployment available

Qwen3 0.6B Base Unsloth Bnb 4bit

Developed by unsloth

Qwen3-0.6B-Base is the latest generation of large language models in the Tongyi series. It has a parameter scale of 0.6B, supports 119 languages, and has a context length of up to 32,768 tokens.

Large Language Model

Transformers

Open Source License:Apache-2.0 #Multilingual large model #Long context understanding #Three-stage pre-training

Downloads 10.84k

Release Time : 4/28/2025

Model Overview

Qwen3-0.6B-Base is a pre-trained causal language model that focuses on extensive language modeling and general knowledge acquisition, while also having inference capabilities and long context understanding capabilities.

Model Features

Multilingual support

Pre-trained on 36 trillion tokens in 119 languages, with a wide language coverage.

Three-stage pre-training

The first stage focuses on language modeling and general knowledge acquisition; the second stage improves inference capabilities; the third stage enhances long context understanding capabilities.

Optimized training techniques

Adopts techniques such as global batch load balancing loss and qk layer normalization to improve model stability and performance.

Long context understanding

Supports a context length of up to 32,768 tokens, suitable for processing long text tasks.

Model Capabilities

Text generation

Language modeling

Multilingual processing

Long context understanding

Logical reasoning

Use Cases

Natural language processing

Multilingual text generation

Generate coherent text in multiple languages

Supports fluent generation in 119 languages

Long document summarization

Process and understand the content of long documents and generate summaries

Benefits from the support of a 32k token context length

Education

STEM question answering

Answer questions related to science, technology, engineering, and mathematics

The STEM capabilities are specifically strengthened in the second stage of pre-training

🚀 Qwen3-0.6B-Base

Qwen3-0.6B-Base is a powerful causal language model in the Qwen series, offering high - performance text generation capabilities.

🚀 Quick Start

The code of Qwen3 has been integrated into the latest Hugging Face transformers. We recommend using the latest version of transformers.

If you use transformers<4.51.0, you will encounter the following error:

KeyError: 'qwen3'

✨ Features

Qwen3 Highlights

Qwen3 is the latest generation of large language models in the Qwen series, providing a comprehensive set of dense and mixture - of - experts (MoE) models. Based on extensive improvements in training data, model architecture, and optimization techniques, Qwen3 offers the following key enhancements compared to the previously released Qwen2.5:

Expanded Higher - Quality Pre - training Corpus: Qwen3 is pre - trained on 36 trillion tokens in 119 languages, tripling the language coverage of Qwen2.5. It uses a much richer mix of high - quality data, including coding, STEM, reasoning, book, multilingual, and synthetic data.
Training Techniques and Model Architecture: Qwen3 incorporates a series of training techniques and architectural refinements, such as the global - batch load balancing loss for MoE models and qk layernorm for all models, which improve stability and overall performance.
Three - stage Pre - training: Stage 1 focuses on broad language modeling and general knowledge acquisition. Stage 2 improves reasoning skills such as STEM, coding, and logical reasoning. Stage 3 enhances long - context comprehension by extending the training sequence length up to 32k tokens.
Scaling Law Guided Hyperparameter Tuning: Through comprehensive scaling law studies across the three - stage pre - training pipeline, Qwen3 systematically tunes critical hyperparameters, such as the learning rate scheduler and batch size, separately for dense and MoE models, resulting in better training dynamics and final performance across different model scales.

Model Overview

Qwen3 - 0.6B - Base has the following features:

Property	Details
Model Type	Causal Language Models
Training Stage	Pretraining
Number of Parameters	0.6B
Number of Parameters (Non - Embedding)	0.44B
Number of Layers	28
Number of Attention Heads (GQA)	16 for Q and 8 for KV
Context Length	32,768

For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our blog, GitHub, and Documentation.

📄 License

This model is licensed under the apache - 2.0 license.

Citation

If you find our work helpful, feel free to cite us.

@misc{qwen3,
    title  = {Qwen3},
    url    = {https://qwenlm.github.io/blog/qwen3/},
    author = {Qwen Team},
    month  = {April},
    year   = {2025}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご