Q

Qwq 32B FP8 Dynamic

Developed by nm-testing
FP8 quantized version of QwQ-32B, reducing storage and memory requirements by 50% through dynamic quantization while maintaining 99.75% of the original model accuracy
Downloads 3,895
Release Time : 3/5/2025

Model Overview

FP8 quantized version based on Qwen/QwQ-32B, suitable for efficient inference deployment, specially optimized for vLLM backend support

Model Features

FP8 Dynamic Quantization
Both weights and activations use FP8 quantization, reducing storage and memory requirements by approximately 50%
High Accuracy Retention
Comprehensive tests show retention of 99.75% of the original model accuracy, with some test metrics even showing improvement
vLLM Optimization
Specially optimized for the vLLM inference framework, supporting efficient parallel inference
Quantization Scheme Optimization
Weights use per-channel symmetric quantization, while activations use per-token symmetric quantization

Model Capabilities

Chinese text generation
Multi-turn dialogue
Complex reasoning
Knowledge Q&A

Use Cases

Intelligent dialogue
Personalized role-playing
Simulate specific character styles for dialogue, such as pirate tone
Achieves stylized expression while maintaining semantic accuracy
Educational assistance
Mathematical problem solving
Solve high school and above difficulty math problems
Achieves 97.44% accuracy on the MATH-500 test
Professional consultation
Professional domain Q&A
Answer GPQA diamond-level professional questions
Maintains 63.21% accuracy
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase