Qwen3-Embedding-8B-W4A16-G128 Open-source Model - Reducing GPU memory usage with high performance, and free deployment makes it highly practical.

Qwen3 Embedding 8B W4A16 G128

Developed by boboliu

GPTQ quantized version of Qwen3-Embedding-8B, significantly reducing VRAM requirements while maintaining high performance

Text Embedding Open Source License:Apache-2.0 #Low VRAM embedding model #Multilingual semantic understanding #GPTQ quantization

Downloads 322

Release Time : 6/6/2025

Model Overview

A 4-bit quantized model based on Qwen3-Embedding-8B for text embedding tasks, significantly reducing VRAM requirements while maintaining high performance

Model Features

VRAM optimization

VRAM usage reduced from 24G to 19624M, can run on 3090/4090 graphics cards

Performance retention

Only 0.81% performance loss in C-MTEB test, still maintains a high level after quantization

Efficient quantization

Adopts W4A16 (4-bit weights, 16-bit activations) quantization scheme

Model Capabilities

Text vectorization

Semantic similarity calculation

Information retrieval

Text classification

Text clustering

Use Cases

Information retrieval

Document search

Convert queries and documents into vectors for similarity matching

Obtained a score of 77.39 in the retrieval task

Text classification

Multi-class classification

Use embedding vectors for text classification

Obtained a score of 76.85 in the classification task

Semantic analysis

Semantic similarity calculation

Calculate the semantic similarity between text pairs

Obtained a score of 62.80 in the STS task

C-MTEB	Param.	Mean(Task)	Mean(Type)	Class.	Clust.	Pair Class.	Rerank.	Retr.	STS
multilingual-e5-large-instruct	0.6B	58.08	58.24	69.80	48.23	64.52	57.45	63.65	45.81
bge-multilingual-gemma2	9B	67.64	75.31	59.30	86.67	68.28	73.73	55.19	-
gte-Qwen2-1.5B-instruct	1.5B	67.12	67.79	72.53	54.61	79.5	68.21	71.86	60.05
gte-Qwen2-7B-instruct	7.6B	71.62	72.19	75.77	66.06	81.16	69.24	75.70	65.20
ritrieve_zh_v1	0.3B	72.71	73.85	76.88	66.5	85.98	72.86	76.97	63.92
Qwen3-Embedding-8B	8B	73.84	75.00	76.97	80.08	84.23	66.99	78.21	63.53
This Model	8B-W4A16	73.24	74.38	76.85	79.58	83.21	66.43	77.39	62.80

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Qwen3 Embedding 8B W4A16 G128

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 Qwen3-Embedding-8B-W4A16-G128

🚀 Quick Start

Installation

✨ Features

Benefit

Cost

📄 License