D

Deepseek R1 Distill Llama 70B FP8 Dynamic

Developed by RedHatAI
The FP8 quantized version of DeepSeek-R1-Distill-Llama-70B, which optimizes inference performance by reducing the number of bits of weights and activations.
Downloads 45.77k
Release Time : 2/1/2025

Model Overview

This is the quantized version of DeepSeek-R1-Distill-Llama-70B. By quantizing weights and activations to the FP8 data type, it reduces disk size and GPU memory requirements while significantly improving inference performance.

Model Features

FP8 Quantization
Both weights and activations are quantized using the FP8 data type, reducing disk size and GPU memory requirements by 50%.
Efficient Inference
Up to 1.4x acceleration can be achieved in single-stream deployment, and up to 3.0x acceleration can be achieved in multi-stream asynchronous deployment.
vLLM Compatibility
Supports efficient deployment using the vLLM backend and provides an OpenAI-compatible service interface.

Model Capabilities

Text Generation
Instruction Following
Multi-round Dialogue
Code Completion
Document Generation
RAG Application

Use Cases

Dialogue System
Multi-round Dialogue
Supports complex multi-round dialogue scenarios.
Reaches 8.90 QPS on A100x4 hardware under the 512/256 token configuration.
Code Generation
Code Completion
Supports the code completion function for programming languages.
The pass@1 reaches 81.00% in the HumanEval test.
Information Retrieval
RAG Application
Supports the question-answering system based on retrieval-augmented generation.
Reaches 7.42 QPS on A100x4 hardware under the 1024/128 token configuration.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase