K

Kimi Dev 72B GGUF

Developed by ubergarm
A quantized version of Kimi-Dev-72B, using advanced nonlinear optimal quantization and multi-head latent attention mechanism to reduce storage and computing requirements.
Downloads 2,780
Release Time : 6/19/2025

Model Overview

This model is a quantized version of Kimi-Dev-72B. Through a specific quantization method, it reduces resource consumption while ensuring performance and is suitable for text generation tasks.

Model Features

Advanced quantization method
Using nonlinear optimal quantization and multi-head latent attention mechanism, significantly reducing the model's storage and computing requirements.
High-performance inference
On high-end hardware configurations, at 2k per batch, PP is about 500 tokens/second and TG is about 5 tokens/second.
Balanced quality and speed
Through a series of experimental quantization tests, a good balance has been achieved between quality and speed.

Model Capabilities

Text generation
Efficient inference
Quantized model support

Use Cases

Text generation
Efficient text generation
While ensuring a certain level of performance, reduce the model's storage and computing requirements, suitable for scenarios that require efficient text generation.
At 2k per batch, PP is about 500 tokens/second and TG is about 5 tokens/second.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase