Qwen3-Embedding-4B-W4A16-G128 Open-source Model - Significantly reduces video memory, with minimal performance loss after quantization!

Qwen3 Embedding 4B W4A16 G128

Developed by boboliu

This is the Qwen3-Embedding-4B model after GPTQ quantization, with significantly reduced video memory usage and minimal performance loss.

Text Embedding Open Source License:Apache-2.0 #Multilingual embedding #GPTQ quantization #Efficient retrieval

Downloads 141

Release Time : 6/6/2025

Model Overview

Qwen3-Embedding-4B-W4A16-G128 is a GPTQ quantized version based on the Qwen/Qwen3-Embedding-4B model, mainly used for text embedding tasks and supports multilingual processing.

Model Features

Efficient quantization

Through GPTQ quantization technology, the video memory usage is reduced from 17430M to 11000M (without using FA2).

Minimal performance loss

The performance loss is only about 0.72% in the C - MTEB evaluation, maintaining high model performance.

Multilingual support

Supports multilingual text embedding tasks and is suitable for international application scenarios.

Model Capabilities

Text embedding

Multilingual processing

Efficient inference

Use Cases

Information retrieval

Document retrieval

Used in large - scale document retrieval systems to improve retrieval efficiency and accuracy.

The score for the retrieval task in the C - MTEB evaluation is 76.15.

Text classification

Sentiment analysis

Used for text sentiment classification tasks, providing high - quality text embedding representations.

The score for the classification task in the C - MTEB evaluation is 75.43.

C-MTEB	Param.	Mean(Task)	Mean(Type)	Class.	Clust.	Pair Class.	Rerank.	Retr.	STS
multilingual-e5-large-instruct	0.6B	58.08	58.24	69.80	48.23	64.52	57.45	63.65	45.81
bge-multilingual-gemma2	9B	67.64	68.52	75.31	59.30	86.67	68.28	73.73	55.19
gte-Qwen2-1.5B-instruct	1.5B	67.12	67.79	72.53	54.61	79.5	68.21	71.86	60.05
gte-Qwen2-7B-instruct	7.6B	71.62	72.19	75.77	66.06	81.16	69.24	75.70	65.20
ritrieve_zh_v1	0.3B	72.71	73.85	76.88	66.5	85.98	72.86	76.97	63.92
Qwen3-Embedding-4B	4B	72.27	73.51	75.46	77.89	83.34	66.05	77.03	61.26
This Model	4B-W4A16	71.75	73.05	75.43	77.51	83.04	65.73	76.15	60.47

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Qwen3 Embedding 4B W4A16 G128

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 Qwen3-Embedding-4B-W4A16-G128

✨ Features

📦 Installation

🚀 Quick Start

📄 License