Qwen3-1.7B-Base Open-Source Large Language Model - Free Access to High-Quality Pretrained Language Capabilities

Qwen3 1.7B Base Unsloth Bnb 4bit

Developed by unsloth

Qwen3-1.7B-Base is the latest generation large language model in the Qwen series, providing high-quality pretrained language modeling capabilities.

Large Language Model

Transformers

Open Source License:Apache-2.0 #Multilingual Large Model #32K Long Context Understanding #STEM Reasoning Optimization

Downloads 689

Release Time : 4/28/2025

Model Overview

Qwen3-1.7B-Base is a 1.7 billion parameter causal language model focused on general language modeling and knowledge acquisition, supporting 32k ultra-long context understanding.

Model Features

High-Quality Pretraining Corpus

Covers 36 trillion tokens across 119 languages, significantly increasing the proportion of high-value content such as programming, STEM, and reasoning.

Training Techniques and Architecture Optimization

Employs innovative techniques like MoE model global batch load balancing loss and full-model qk layer normalization to enhance training stability and performance.

Three-Stage Pretraining System

Phased reinforcement of general language modeling, STEM/programming/logical reasoning capabilities, and long-context understanding.

Ultra-Long Context Support

Supports 32k ultra-long context training, enhancing long-text comprehension capabilities.

Model Capabilities

Text Generation

Language Understanding

Programming Capability

Logical Reasoning

Long-Text Processing

Use Cases

Natural Language Processing

Text Generation

Generate high-quality natural language text

Programming Assistance

Assist in writing and optimizing code

Education

STEM Education

Assist in learning and teaching in STEM fields

🚀 Qwen3-1.7B-Base

Qwen3-1.7B-Base is a powerful causal language model in the Qwen3 series, offering high - quality language processing capabilities with a well - designed architecture and pre - training strategy.

🚀 Quick Start

The code of Qwen3 has been integrated into the latest Hugging Face transformers. It is recommended to use the latest version of transformers.

If you use transformers<4.51.0, you will encounter the following error:

KeyError: 'qwen3'

✨ Features

Qwen3 Highlights

Qwen3 is the latest generation of large language models in the Qwen series, providing a comprehensive set of dense and mixture - of - experts (MoE) models. Based on extensive improvements in training data, model architecture, and optimization techniques, Qwen3 offers the following key enhancements compared to the previously released Qwen2.5:

Expanded Higher - Quality Pre - training Corpus: Qwen3 is pre - trained on 36 trillion tokens in 119 languages, tripling the language coverage of Qwen2.5. It uses a much richer mix of high - quality data, including coding, STEM, reasoning, book, multilingual, and synthetic data.
Training Techniques and Model Architecture: Qwen3 incorporates a series of training techniques and architectural refinements, such as the global - batch load balancing loss for MoE models and qk layernorm for all models, which improve stability and overall performance.
Three - stage Pre - training: Stage 1 focuses on broad language modeling and general knowledge acquisition. Stage 2 enhances reasoning skills in areas like STEM, coding, and logical reasoning. Stage 3 improves long - context comprehension by extending the training sequence length up to 32k tokens.
Scaling Law Guided Hyperparameter Tuning: Through comprehensive scaling law studies across the three - stage pre - training pipeline, Qwen3 systematically tunes critical hyperparameters (e.g., learning rate scheduler and batch size) separately for dense and MoE models, resulting in better training dynamics and final performance across different model scales.

Model Overview

Qwen3 - 1.7B - Base has the following features:

Property	Details
Model Type	Causal Language Models
Training Stage	Pretraining
Number of Parameters	1.7B
Number of Parameters (Non - Embedding)	1.4B
Number of Layers	28
Number of Attention Heads (GQA)	16 for Q and 8 for KV
Context Length	32,768

For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our blog, GitHub, and Documentation.

📚 Documentation

Evaluation & Performance

Detailed evaluation results are reported in this 📑 blog.

Citation

If you find our work helpful, feel free to cite us.

@misc{qwen3,
    title  = {Qwen3},
    url    = {https://qwenlm.github.io/blog/qwen3/},
    author = {Qwen Team},
    month  = {April},
    year   = {2025}
}

📄 License

This model is licensed under the apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご