D

Deepseek R1 Distill Qwen 32B Quantized.w8a8

Developed by neuralmagic
INT8 quantized version of DeepSeek-R1-Distill-Qwen-32B, reducing VRAM usage and improving computational efficiency through weight and activation quantization.
Downloads 2,324
Release Time : 2/5/2025

Model Overview

Quantized model based on DeepSeek-R1-Distill-Qwen-32B, optimized with INT8 quantization for weights and activations, significantly lowering VRAM requirements and boosting inference speed.

Model Features

INT8 Quantization
Both weights and activations use INT8 quantization, reducing GPU VRAM usage by approximately 50% and improving matrix multiplication throughput by about 2x.
Efficient Inference
Supports efficient deployment via vLLM backend, optimizing inference performance for large-scale language models.
High Accuracy Retention
The quantized model maintains over 99% of the original model's accuracy across multiple benchmarks.

Model Capabilities

Text generation
Dialogue systems
Code generation
Mathematical reasoning

Use Cases

Dialogue systems
Intelligent customer service
Used to build efficient intelligent customer service systems for handling user queries.
Supports multi-turn dialogues with fast response times.
Code generation
Programming assistance
Helps developers generate code snippets or solve programming problems.
Achieves 85.8% pass@1 on the HumanEval benchmark.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase