Qwen3 Reranker 4B W4A16 G128
This is the result of GPTQ quantization on Qwen/Qwen3-Reranker-4B, significantly reducing VRAM usage.
Downloads 157
Release Time : 6/7/2025
Model Overview
A quantized version based on Qwen3-Reranker-4B, mainly used for text classification tasks. The VRAM usage efficiency is optimized through GPTQ quantization technology.
Model Features
VRAM Optimization
VRAM usage is reduced from 17430M to 11000M (without using FA2), greatly improving resource efficiency.
Accuracy Maintenance
While significantly reducing VRAM usage, the expected accuracy loss is <5%. The Embedding version shows a loss of only about 0.7%.
Quantization Technology
Adopt GPTQ quantization technology (W4A16-G128 configuration) to achieve model compression and acceleration.
Model Capabilities
Text Re-ranking
Text Relevance Scoring
Information Retrieval Optimization
Use Cases
Information Retrieval
Search Engine Result Optimization
Re-rank the results returned by the search engine to improve relevance.
Improve the relevance ranking of search results.
Recommendation System
Recommended Content Sorting
Optimize the sorting of the content list generated by the recommendation system.
Improve the relevance of recommended content and user satisfaction.
Featured Recommended AI Models