Q

Qwen3 32B FP8 Dynamic

Developed by RedHatAI
An efficient language model based on Qwen3-32B with FP8 dynamic quantization, significantly reducing memory requirements and improving computational efficiency
Downloads 917
Release Time : 5/2/2025

Model Overview

This model is obtained by quantizing the activations and weights of Qwen3-32B to FP8 data type, reducing GPU memory requirements by approximately 50% and improving matrix multiplication throughput by about 2x. Suitable for tasks such as inference, function calling, and multilingual instruction following.

Model Features

FP8 Quantization
Quantization of weights and activations to FP8 data type, significantly reducing memory requirements and improving computational efficiency
Efficient Deployment
Supports efficient deployment via vLLM backend, optimizing inference performance
High Accuracy Retention
The quantized model retains over 99% of the original model's accuracy across multiple benchmarks

Model Capabilities

Text generation
Function calling
Multilingual instruction following
Translation
Inference task processing

Use Cases

General AI Assistant
Knowledge Q&A
Answering various knowledge-based questions
Achieved a score of 80.89 in MMLU (5-shot) testing
Mathematical Reasoning
Solving math problems and logical reasoning
Achieved a score of 88.32 in GSM-8K testing
Professional Domain Applications
Medical Q&A
Answering medical-related questions
Achieved a score of 79.37 in AIME 2024 testing
Code Generation
Generating code based on descriptions
Performs well in code generation tasks
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase