Q

Qwen3 32B Quantized.w4a16

Developed by RedHatAI
INT4 quantized version of Qwen3-32B, reducing disk and GPU memory requirements by 75% through weight quantization while maintaining high performance
Downloads 2,213
Release Time : 5/5/2025

Model Overview

Quantized model based on Qwen3-32B, suitable for text generation, function calling, and multilingual tasks, supporting efficient inference

Model Features

Efficient quantization
Utilizes INT4 weight quantization to reduce disk and GPU memory requirements by 75%
High performance retention
Quantized model maintains over 99% of original performance across multiple benchmarks
Multilingual support
Supports instruction following and translation tasks in multiple languages
Efficient inference
Optimized for deployment on efficient inference frameworks like vLLM

Model Capabilities

Text generation
Function calling
Multilingual instruction following
Translation
Domain fine-tuning

Use Cases

General reasoning
Knowledge Q&A
Answers various knowledge-based questions
Achieved 80.36 points in MMLU tests
Mathematical reasoning
Solves mathematical problems
Achieved 85.97 points in GSM-8K tests
Professional applications
Domain expert
Becomes a domain expert through fine-tuning
Code generation
Generates programming code
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase