DeepSeek-R1-FP4 Open-Source Text Generation Model - Optimized Architecture Empowers Efficient Text Creation

Home

Deepseek R1 FP4

Developed by nvidia

FP4 quantized version of DeepSeek R1 model, using optimized Transformer architecture for efficient text generation

Large Language Model

Safetensors

Open Source License:MIT #FP4 quantized inference #128K long text processing #TensorRT-LLM optimization

Downloads 61.51k

Release Time : 2/21/2025

Model Overview

FP4 quantized model based on DeepSeek R1, optimized for TensorRT-LLM inference with 128K long context generation support

Model Features

FP4 Quantization Technology

Achieves FP4 quantization for weights and activations through TensorRT Model Optimizer, reducing storage requirements by 1.6x

Long Context Support

Supports ultra-long context processing of 128K tokens

Blackwell Architecture Optimization

Inference performance specifically optimized for NVIDIA Blackwell GPU architecture

Model Capabilities

Text generation

Long text comprehension

Knowledge Q&A

Use Cases

Content generation

Article continuation

Automatically generates coherent subsequent content based on given opening

Knowledge Q&A

Factual Q&A

Answers various questions about world knowledge

Achieved 90.7% accuracy on MMLU benchmark

🚀 NVIDIA DeepSeek R1 FP4 Model

The NVIDIA DeepSeek R1 FP4 model is a quantized version of DeepSeek AI's DeepSeek R1, an auto - regressive language model with an optimized transformer architecture. It's quantized using the [TensorRT Model Optimizer](https://github.com/NVIDIA/TensorRT - Model - Optimizer). This model is available for both commercial and non - commercial use.

🚀 Quick Start

Deploy with TensorRT - LLM

To deploy the quantized FP4 checkpoint with [TensorRT - LLM](https://github.com/NVIDIA/TensorRT - LLM) LLM API, use the following sample codes (requires 8xB200 GPU and TensorRT LLM built from source with the latest main branch):

from tensorrt_llm import SamplingParams
from tensorrt_llm._torch import LLM

def main():

    prompts = [
        "Hello, my name is",
        "The president of the United States is",
        "The capital of France is",
        "The future of AI is",
    ]
    sampling_params = SamplingParams(max_tokens=32)

    llm = LLM(model="nvidia/DeepSeek - R1 - FP4", tensor_parallel_size=8, enable_attention_dp=True)

    outputs = llm.generate(prompts, sampling_params)

    # Print the outputs.
    for output in outputs:
        prompt = output.prompt
        generated_text = output.outputs[0].text
        print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")


# The entry point of the program need to be protected for spawning processes.
if __name__ == '__main__':
    main()

✨ Features

Quantized Model: Quantized to FP4 data type, reducing disk size and GPU memory requirements.
Commercial Use: Available for both commercial and non - commercial applications.
High - Performance Runtime: Supported by Tensor(RT) - LLM runtime engine.

📦 Installation

No specific installation steps provided in the original README.

💻 Usage Examples

Basic Usage

The basic usage is demonstrated in the deployment section above, where you can generate text based on given prompts using TensorRT - LLM.

Evaluation

The accuracy benchmark results are presented in the table below:

Precision	MMLU	GSM8K	AIME2024	GPQA Diamond	MATH - 500
FP8	90.8	96.3	80.0	69.7	95.4
FP4	90.7	96.1	80.0	69.2	94.2

📚 Documentation

Model Overview

The NVIDIA DeepSeek R1 FP4 model is the quantized version of DeepSeek AI's DeepSeek R1. It uses an optimized transformer architecture. For more information, check [here](https://huggingface.co/deepseek - ai/DeepSeek - R1).

Third - Party Community Consideration

This model is not owned or developed by NVIDIA. It's developed and built to a third - party’s requirements. See the link to the Non - NVIDIA [(DeepSeek R1) Model Card](https://huggingface.co/deepseek - ai/DeepSeek - R1).

Model Architecture

Property	Details
Model Type	Transformers
Network Architecture	DeepSeek R1

Input

Property	Details
Input Type(s)	Text
Input Format(s)	String
Input Parameters	1D (One Dimensional): Sequences
Other Properties Related to Input	Context length up to 128K

Output

Property	Details
Output Type(s)	Text
Output Format	String
Output Parameters	1D (One Dimensional): Sequences
Other Properties Related to Output	N/A

Software Integration

Property	Details
Supported Runtime Engine(s)	Tensor(RT) - LLM
Supported Hardware Microarchitecture Compatibility	NVIDIA Blackwell
Preferred Operating System(s)	Linux

Model Version(s)

The model is quantized with nvidia - modelopt v0.23.0.

Datasets

Dataset Type	Details
Calibration Dataset	cnn_dailymail, Data collection method: Automated, Labeling method: Unknown
Evaluation Dataset	MMLU, Data collection method: Unknown, Labeling method: N/A

Inference

Property	Details
Engine	Tensor(RT) - LLM
Test Hardware	B200

Post Training Quantization

This model was obtained by quantizing the weights and activations of DeepSeek R1 to FP4 data type, ready for inference with TensorRT - LLM. Only the weights and activations of the linear operators within transformers blocks are quantized. This optimization reduces the number of bits per parameter from 8 to 4, reducing the disk size and GPU memory requirements by approximately 1.6x.

Ethical Considerations

NVIDIA believes Trustworthy AI is a shared responsibility and has established policies and practices for AI development. Developers should ensure this model meets industry requirements and address potential misuse. Report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en - us/support/submit - security - vulnerability/).

🔧 Technical Details

The quantization process involves converting the weights and activations of the DeepSeek R1 model to FP4 data type. Only the linear operators within the transformers blocks have their weights and activations quantized. This reduces the bit - per - parameter from 8 to 4, resulting in a 1.6x reduction in disk size and GPU memory requirements.

📄 License

This model is released under the MIT license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご