Qwen3 Embedding 4B W4A16 G128
This is the Qwen3-Embedding-4B model after GPTQ quantization, with significantly reduced video memory usage and minimal performance loss.
Downloads 141
Release Time : 6/6/2025
Model Overview
Qwen3-Embedding-4B-W4A16-G128 is a GPTQ quantized version based on the Qwen/Qwen3-Embedding-4B model, mainly used for text embedding tasks and supports multilingual processing.
Model Features
Efficient quantization
Through GPTQ quantization technology, the video memory usage is reduced from 17430M to 11000M (without using FA2).
Minimal performance loss
The performance loss is only about 0.72% in the C - MTEB evaluation, maintaining high model performance.
Multilingual support
Supports multilingual text embedding tasks and is suitable for international application scenarios.
Model Capabilities
Text embedding
Multilingual processing
Efficient inference
Use Cases
Information retrieval
Document retrieval
Used in large - scale document retrieval systems to improve retrieval efficiency and accuracy.
The score for the retrieval task in the C - MTEB evaluation is 76.15.
Text classification
Sentiment analysis
Used for text sentiment classification tasks, providing high - quality text embedding representations.
The score for the classification task in the C - MTEB evaluation is 75.43.
Featured Recommended AI Models