Qwen3 4B Base
Qwen3-4B-Base is the latest generation of the Qwen series' 4-billion-parameter large language model, pre-trained on 36 trillion tokens of multilingual data, supporting a 32k context length.
Downloads 50.84k
Release Time : 4/28/2025
Model Overview
Qwen3-4B-Base is a causal language model focused on general language understanding and generation tasks, suitable for various scenarios such as text generation and code completion.
Model Features
Large-scale multilingual pre-training
Pre-trained on 36 trillion tokens of data covering 119 languages, with language coverage three times that of the previous generation.
Three-stage training optimization
Adopts a three-stage pre-training paradigm: general language modeling â specialized capability enhancement â long-context training.
Long-context support
Supports processing ultra-long contexts of up to 32k tokens.
Efficient attention mechanism
Utilizes Grouped Query Attention (GQA) architecture with 32 query heads and 8 key-value heads.
Model Capabilities
Text generation
Multilingual understanding
Code completion
Logical reasoning
Long-text processing
Use Cases
Natural Language Processing
Multilingual text generation
Generates coherent text content in multiple languages.
Supports fluent generation in 119 languages.
Technical document processing
Handles technical documents and code in STEM fields.
Optimized for code and STEM-related data.
Development Assistance
Code completion
Assists programmers in writing and completing code.
Increased proportion of code-related data in pre-training.
Featured Recommended AI Models
Š 2025AIbase