The Quantized Model of DeepSeek-R1-0528 Open-sourced - Reducing GPU Memory and Disk Space Usage

Deepseek R1 0528 Quantized.w4a16

Developed by RedHatAI

The DeepSeek-R1-0528 model after quantization processing significantly reduces the requirements for GPU memory and disk space by quantizing the weights to the INT4 data type.

Large Language Model

Safetensors

Open Source License:MIT #INT4 Quantized Inference #Low VRAM Requirement #Mathematical Inference Optimization

Downloads 126

Release Time : 5/30/2025

Model Overview

This model is a quantized version based on DeepSeek-R1-0528, mainly used for text generation tasks. Resource utilization efficiency is optimized through weight quantization.

Model Features

INT4 Weight Quantization

Reduce the weights from 8 bits to 4 bits, significantly reducing the GPU memory and disk space requirements by approximately 50%.

Efficient Deployment

Supports efficient deployment using the vLLM backend to optimize inference speed.

High-performance Inference

Performs close to the original model on multiple inference tasks with minimal accuracy loss.

Model Capabilities

Text Generation

Efficient Inference

Use Cases

Academic Research

Mathematical Problem Solving

Used to solve complex mathematical problems, such as the questions in the MATH-500 dataset.

pass@1 accuracy of 97.40%

General Knowledge Q&A

Answer the high-difficulty questions in the GPQA Diamond dataset.

pass@1 accuracy of 80.61%

Education

AIME Contest Question Answering

Generate answers to the questions in the American Invitational Mathematics Examination (AIME).

pass@1 accuracy of 87.33%

🚀 DeepSeek-R1-0528-quantized.w4a16

This project focuses on the quantized version of the DeepSeek-R1-0528 model. It optimizes the model through weight quantization, reducing GPU memory and disk size requirements. It can be efficiently deployed using the vLLM backend and has been evaluated on popular reasoning tasks.

🚀 Quick Start

This model can be deployed efficiently using the vLLM backend, as shown in the example below.

from vllm import LLM, SamplingParams
from transformers import AutoTokenizer
model_id = "RedHatAI/DeepSeek-R1-0528-quantized.w4a16"
number_gpus = 8
sampling_params = SamplingParams(temperature=0.6, top_p=0.95, max_tokens=256)
tokenizer = AutoTokenizer.from_pretrained(model_id)
prompt = "Give me a short introduction to large language model."
llm = LLM(model=model_id, tensor_parallel_size=number_gpus)
outputs = llm.generate(prompt, sampling_params)
generated_text = outputs[0].outputs[0].text
print(generated_text)

vLLM also supports OpenAI-compatible serving. See the documentation for more details.

✨ Features

Model Overview

Model Architecture: DeepseekV3ForCausalLM
- Input: Text
- Output: Text
Model Optimizations:
- Activation quantization: None
- Weight quantization: INT4
Release Date: 05/30/2025
Version: 1.0
Model Developers: Red Hat (Neural Magic)

Model Optimizations

This model was obtained by quantizing weights of DeepSeek-R1-0528 to INT4 data type. This optimization reduces the number of bits used to represent weights from 8 to 4, reducing GPU memory requirements (by approximately 50%). Weight quantization also reduces disk size requirements by approximately 50%.

💻 Usage Examples

Basic Usage

from vllm import LLM, SamplingParams
from transformers import AutoTokenizer
model_id = "RedHatAI/DeepSeek-R1-0528-quantized.w4a16"
number_gpus = 8
sampling_params = SamplingParams(temperature=0.6, top_p=0.95, max_tokens=256)
tokenizer = AutoTokenizer.from_pretrained(model_id)
prompt = "Give me a short introduction to large language model."
llm = LLM(model=model_id, tensor_parallel_size=number_gpus)
outputs = llm.generate(prompt, sampling_params)
generated_text = outputs[0].outputs[0].text
print(generated_text)

📚 Documentation

Evaluation

The model was evaluated on popular reasoning tasks (AIME 2024, MATH-500, GPQA-Diamond) via LightEval. For reasoning evaluations, we estimate pass@1 based on 10 runs with different seeds, temperature=0.6, top_p=0.95 and max_new_tokens=65536.

Accuracy

	Recovery (%)	deepseek/DeepSeek-R1-0528	RedHatAI/DeepSeek-R1-0528-quantized.w4a16 (this model)
AIME 2024 pass@1	98.50	88.66	87.33
MATH-500 pass@1	99.88	97.52	97.40
GPQA Diamond pass@1	101.21	79.65	80.61
Reasoning Average Score	99.82	88.61	88.45

📄 License

This project is licensed under the MIT license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご