D

Deepseek R1 0528 AWQ

Developed by cognitivecomputations
AWQ-quantized version of DeepSeek R1 0528, supports full-context-length operation on 8x80GB GPUs using vLLM.
Downloads 145
Release Time : 6/1/2025

Model Overview

This is an AWQ-quantized version of DeepSeek-R1-0528 model, fixing overflow issues with float16 and optimizing runtime efficiency under vLLM framework.

Model Features

AWQ Quantization Optimization
Modified model code to fix float16 overflow issues and improve runtime efficiency.
Full Context Length Support
Supports full-context-length operation on 8x80GB GPUs using vLLM.
High-performance Inference
Optimized FlashMLA implementation for A100 GPUs, outperforming Triton in high-context reasoning.

Model Capabilities

Text generation
Long-text processing
Multilingual support

Use Cases

Text generation
Long-text generation
Supports text generation tasks with up to 63K input and 2K output.
Achieves 54.3 TPS on 8x H100/H200 configuration
Batch processing
Supports batch processing of 32 requests with 4K input and 256 output each.
Achieves 30.1 TPS on 8x H100/H200 configuration
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase