Q

Qwen3 Reranker 4B W4A16 G128

Developed by boboliu
This is the result of GPTQ quantization on Qwen/Qwen3-Reranker-4B, significantly reducing VRAM usage.
Downloads 157
Release Time : 6/7/2025

Model Overview

A quantized version based on Qwen3-Reranker-4B, mainly used for text classification tasks. The VRAM usage efficiency is optimized through GPTQ quantization technology.

Model Features

VRAM Optimization
VRAM usage is reduced from 17430M to 11000M (without using FA2), greatly improving resource efficiency.
Accuracy Maintenance
While significantly reducing VRAM usage, the expected accuracy loss is <5%. The Embedding version shows a loss of only about 0.7%.
Quantization Technology
Adopt GPTQ quantization technology (W4A16-G128 configuration) to achieve model compression and acceleration.

Model Capabilities

Text Re-ranking
Text Relevance Scoring
Information Retrieval Optimization

Use Cases

Information Retrieval
Search Engine Result Optimization
Re-rank the results returned by the search engine to improve relevance.
Improve the relevance ranking of search results.
Recommendation System
Recommended Content Sorting
Optimize the sorting of the content list generated by the recommendation system.
Improve the relevance of recommended content and user satisfaction.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase