Open-source model Qwen3-Reranker-4B-W4A16-G128 - Reduce video memory usage and achieve efficient inference

Home

Qwen3 Reranker 4B W4A16 G128

Developed by boboliu

This is the result of GPTQ quantization on Qwen/Qwen3-Reranker-4B, significantly reducing VRAM usage.

Large Language Model

Transformers

Open Source License:Apache-2.0 #GPTQ Quantization #Text Re-ranking #Low VRAM Consumption

Downloads 157

Release Time : 6/7/2025

Model Overview

A quantized version based on Qwen3-Reranker-4B, mainly used for text classification tasks. The VRAM usage efficiency is optimized through GPTQ quantization technology.

Model Features

VRAM Optimization

VRAM usage is reduced from 17430M to 11000M (without using FA2), greatly improving resource efficiency.

Accuracy Maintenance

While significantly reducing VRAM usage, the expected accuracy loss is <5%. The Embedding version shows a loss of only about 0.7%.

Quantization Technology

Adopt GPTQ quantization technology (W4A16-G128 configuration) to achieve model compression and acceleration.

Model Capabilities

Text Re-ranking

Text Relevance Scoring

Information Retrieval Optimization

Use Cases

Information Retrieval

Search Engine Result Optimization

Re-rank the results returned by the search engine to improve relevance.

Improve the relevance ranking of search results.

Recommendation System

Property	Details
Model Type	Qwen3-Reranker-4B-W4A16-G128
Base Model	Qwen/Qwen3-Reranker-4B
Pipeline Tag	text-classification
Tags	transformers

Qwen3 Reranker 4B W4A16 G128

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 Qwen3-Reranker-4B-W4A16-G128

🚀 Quick Start

Installation

✨ Features

Benefits

Costs

📄 License