QWQ-32B-FP8 Open-source AI Model - Free Deployment, High-speed Inference with High Accuracy

QWQ 32B FP8

Developed by qingcheng-ai

QwQ-32B-FP8 is the FP8 quantized version of the QwQ-32B model, maintaining nearly the same accuracy as the BF16 version while supporting faster inference speed.

Large Language Model

Transformers

Open Source License:Apache-2.0 #FP8 quantization #Efficient inference #MMLU benchmark

Downloads 144

Release Time : 3/21/2025

Model Overview

FP8 quantized version of the QwQ-32B model, suitable for efficient inference tasks, with performance comparable to the original BF16 version.

Model Features

Efficient inference

The FP8 quantized version supports faster inference speed while maintaining the same accuracy as the BF16 version.

High performance

Excellent performance on the MMLU benchmark, achieving the same score as the original BF16 version.

Lightweight

Reduces model size through FP8 quantization technology, suitable for resource-constrained environments.

Model Capabilities

Text generation

Efficient inference

Use Cases

Natural language processing

Question answering system

Can be used to build high-performance question answering systems to handle complex queries.

Achieved a score of 61.2 on the MMLU benchmark, demonstrating excellent performance.

Text generation

Suitable for various text generation tasks, such as content creation, summarization, etc.

Data Format	MMLU Score
BF16 Official	61.2
FP8 Quantized	61.2
Q8_0 (INT8)	59.1
AWQ (INT4)	53.4

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

QWQ 32B FP8

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 FP8 Quantized QwQ-32B

🚀 Quick Start

✨ Features

📚 Documentation

Description

Evaluation

📄 License

📞 Contact