Qwen3-Reranker-0.6B-W4A16-G128 Open-source Model - Optimize GPU memory usage with minimal accuracy loss

Home

Qwen3 Reranker 0.6B W4A16 G128

Developed by boboliu

The GPTQ quantized version of Qwen3-Reranker-0.6B, with optimized video memory usage and small precision loss

Text Classification

Transformers

Open Source License:Apache-2.0 #GPTQ Quantization #Video Memory Optimization #Text Reordering

Downloads 151

Release Time : 6/7/2025

Model Overview

This is a GPTQ quantized model based on Qwen/Qwen3-Reranker-0.6B, mainly used for text classification tasks. The quantization technology significantly reduces the video memory usage while maintaining high precision.

Model Features

Video Memory Optimization

The video memory usage is reduced from 3228M to 2124M (without FA2), significantly improving resource efficiency

Precision Preservation

The expected precision loss is <5%. The actual test shows that the precision loss of the embedding model is only about 0.7%

Efficient Quantization

Use GPTQ quantization technology, combined with Ultrachat, T2Ranking, and COIG-CQIA as the calibration set

Model Capabilities

Text Classification

Text Reordering

Use Cases

Information Retrieval

Search Result Reordering

Reorder the results returned by the search engine to improve relevance

Text Processing

Document Classification

Automatically classify a large number of documents

Property	Details
Base Model	Qwen/Qwen3-Reranker-0.6B
Pipeline Tag	text-classification
Tags	transformers
License	Apache-2.0

Featured Recommended AI Models

Qwen2.5 VL 7B Abliterated Caption It I1 GGUF

Apache-2.0

Quantized version of Qwen2.5-VL-7B-Abliterated-Caption-it, supporting multilingual image description tasks.

Image-to-Text

Transformers Supports Multiple Languages

mradermacher

167

Nunchaku Flux.1 Dev Colossus

Other

The Nunchaku quantized version of the Colossus Project Flux, designed to generate high-quality images based on text prompts. This model minimizes performance loss while optimizing inference efficiency.

Image Generation English

nunchaku-tech

235

Qwen2.5 VL 7B Abliterated Caption It GGUF

Apache-2.0

This is a static quantized version based on the Qwen2.5-VL-7B model, focusing on image captioning generation tasks and supporting multiple languages.

Image-to-Text

Transformers Supports Multiple Languages

olmOCR-7B-0725-FP8 is a document OCR model based on the Qwen2.5-VL-7B-Instruct model. It is fine-tuned using the olmOCR-mix-0225 dataset and then quantized to the FP8 version.

Lucy-128k is a model developed based on Qwen3-1.7B, focusing on proxy-based web search and lightweight browsing, and can run efficiently on mobile devices.

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Qwen3 Reranker 0.6B W4A16 G128

Model Introduction

Content Details

Alternatives

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 Qwen3-Reranker-0.6B-W4A16-G128

🚀 Quick Start

📦 Installation

💻 Usage Examples

✨ Features

📈 Benefit

📉 Cost

📄 License

📋 Information Table

Featured Recommended AI Models