Qwen3 Embedding 8B W4A16 G128
GPTQ quantized version of Qwen3-Embedding-8B, significantly reducing VRAM requirements while maintaining high performance
Downloads 322
Release Time : 6/6/2025
Model Overview
A 4-bit quantized model based on Qwen3-Embedding-8B for text embedding tasks, significantly reducing VRAM requirements while maintaining high performance
Model Features
VRAM optimization
VRAM usage reduced from 24G to 19624M, can run on 3090/4090 graphics cards
Performance retention
Only 0.81% performance loss in C-MTEB test, still maintains a high level after quantization
Efficient quantization
Adopts W4A16 (4-bit weights, 16-bit activations) quantization scheme
Model Capabilities
Text vectorization
Semantic similarity calculation
Information retrieval
Text classification
Text clustering
Use Cases
Information retrieval
Document search
Convert queries and documents into vectors for similarity matching
Obtained a score of 77.39 in the retrieval task
Text classification
Multi-class classification
Use embedding vectors for text classification
Obtained a score of 76.85 in the classification task
Semantic analysis
Semantic similarity calculation
Calculate the semantic similarity between text pairs
Obtained a score of 62.80 in the STS task
Featured Recommended AI Models