DeepSeek-V2-Lite-Chat-IMat-GGUF Open Source Model - Supports Local Deployment and Inference with Multiple Quantization Types

Deepseek V2 Lite Chat IMat GGUF

Developed by legraphista

GGUF quantized version of DeepSeek-V2-Lite-Chat, supporting multiple quantization types, suitable for local deployment and inference.

Large Language Model #Efficient Quantized Inference #Low-Resource Deployment #Chinese Dialogue Optimization

Downloads 1,413

Release Time : 5/26/2024

Model Overview

This is the Llama.cpp imatrix quantized version of the deepseek-ai/DeepSeek-V2-Lite-Chat model, designed for text generation tasks.

Model Features

Multiple Quantization Options

Offers various quantized versions from Q8_0 to IQ1_S, catering to different hardware and performance needs.

IMatrix Quantization Support

Some quantized versions utilize IMatrix technology, potentially improving post-quantization model performance.

Local Inference Optimization

GGUF format is optimized for local inference, suitable for running on consumer-grade hardware.

Model Capabilities

Text Generation

Dialogue Interaction

Supports Chinese tasks

Use Cases

Chat Applications

Intelligent Dialogue Assistant

Deploy as a local chatbot to provide intelligent conversation services.

Smooth Chinese dialogue experience

Content Generation

Text Creation Assistance

Helps users generate articles, stories, and other textual content.

Produces coherent text fitting the context

🚀 DeepSeek-V2-Lite-Chat-IMat-GGUF

Llama.cpp imatrix quantization of deepseek-ai/DeepSeek-V2-Lite-Chat

This project offers the llama.cpp imatrix quantization of the deepseek-ai/DeepSeek-V2-Lite-Chat model, providing various quantized versions for different usage scenarios.

🚀 Quick Start

Prerequisites

Ensure you have huggingface-cli installed. You can install it using the following command:

pip install -U "huggingface_hub[cli]"

Downloading the Model

To download a specific file, use the following command:

huggingface-cli download legraphista/DeepSeek-V2-Lite-Chat-IMat-GGUF --include "DeepSeek-V2-Lite-Chat.Q8_0.gguf" --local-dir ./

If the model is larger than 50GB and split into multiple files, download all parts to a local folder:

huggingface-cli download legraphista/DeepSeek-V2-Lite-Chat-IMat-GGUF --include "DeepSeek-V2-Lite-Chat.Q8_0/*" --local-dir DeepSeek-V2-Lite-Chat.Q8_0
# see FAQ for merging GGUF's

✨ Features

Multiple Quantized Versions: Offers a variety of quantized versions, including different bit depths and quantization types, to meet diverse performance and storage requirements.
IMatrix Quantization: Some versions utilize IMatrix quantization, which may improve performance in certain scenarios.

📦 Installation

The installation mainly involves downloading the model files using huggingface-cli as described in the Quick Start section.

💻 Usage Examples

Basic Usage

Simple chat template

<｜begin▁of▁sentence｜>User: {user_message_1}

Assistant: {assistant_message_1}<｜end▁of▁sentence｜>User: {user_message_2}

Assistant:

Chat template with system prompt

<｜begin▁of▁sentence｜>{system_message}

User: {user_message_1}

Assistant: {assistant_message_1}<｜end▁of▁sentence｜>User: {user_message_2}

Assistant:

Advanced Usage

Llama.cpp

llama.cpp/main -m DeepSeek-V2-Lite-Chat.Q8_0.gguf --color -i -p "prompt here (according to the chat template)"

📚 Documentation

Files

IMatrix

Status: ✅ Available
Link: here

Common Quants

Filename	Quant type	File Size	Status	Uses IMatrix	Is Split
DeepSeek-V2-Lite-Chat.Q8_0.gguf	Q8_0	16.70GB	✅ Available	⚪ No	📦 No
DeepSeek-V2-Lite-Chat.Q6_K.gguf	Q6_K	14.07GB	✅ Available	⚪ No	📦 No
DeepSeek-V2-Lite-Chat.Q4_K.gguf	Q4_K	10.36GB	✅ Available	🟢 Yes	📦 No
DeepSeek-V2-Lite-Chat.Q3_K.gguf	Q3_K	8.13GB	✅ Available	🟢 Yes	📦 No
DeepSeek-V2-Lite-Chat.Q2_K.gguf	Q2_K	6.43GB	✅ Available	🟢 Yes	📦 No

All Quants

Filename	Quant type	File Size	Status	Uses IMatrix	Is Split
DeepSeek-V2-Lite-Chat.FP16.gguf	F16	31.42GB	✅ Available	⚪ No	📦 No
DeepSeek-V2-Lite-Chat.BF16.gguf	BF16	31.42GB	✅ Available	⚪ No	📦 No
DeepSeek-V2-Lite-Chat.Q5_K.gguf	Q5_K	11.85GB	✅ Available	⚪ No	📦 No
DeepSeek-V2-Lite-Chat.Q5_K_S.gguf	Q5_K_S	11.14GB	✅ Available	⚪ No	📦 No
DeepSeek-V2-Lite-Chat.Q4_K_S.gguf	Q4_K_S	9.53GB	✅ Available	🟢 Yes	📦 No
DeepSeek-V2-Lite-Chat.Q3_K_L.gguf	Q3_K_L	8.46GB	✅ Available	🟢 Yes	📦 No
DeepSeek-V2-Lite-Chat.Q3_K_S.gguf	Q3_K_S	7.49GB	✅ Available	🟢 Yes	📦 No
DeepSeek-V2-Lite-Chat.Q2_K_S.gguf	Q2_K_S	6.46GB	✅ Available	🟢 Yes	📦 No
DeepSeek-V2-Lite-Chat.IQ4_NL.gguf	IQ4_NL	8.91GB	✅ Available	🟢 Yes	📦 No
DeepSeek-V2-Lite-Chat.IQ4_XS.gguf	IQ4_XS	8.57GB	✅ Available	🟢 Yes	📦 No
DeepSeek-V2-Lite-Chat.IQ3_M.gguf	IQ3_M	7.55GB	✅ Available	🟢 Yes	📦 No
DeepSeek-V2-Lite-Chat.IQ3_S.gguf	IQ3_S	7.49GB	✅ Available	🟢 Yes	📦 No
DeepSeek-V2-Lite-Chat.IQ3_XS.gguf	IQ3_XS	7.12GB	✅ Available	🟢 Yes	📦 No
DeepSeek-V2-Lite-Chat.IQ3_XXS.gguf	IQ3_XXS	6.96GB	✅ Available	🟢 Yes	📦 No
DeepSeek-V2-Lite-Chat.IQ2_M.gguf	IQ2_M	6.33GB	✅ Available	🟢 Yes	📦 No
DeepSeek-V2-Lite-Chat.IQ2_S.gguf	IQ2_S	6.01GB	✅ Available	🟢 Yes	📦 No
DeepSeek-V2-Lite-Chat.IQ2_XS.gguf	IQ2_XS	5.97GB	✅ Available	🟢 Yes	📦 No
DeepSeek-V2-Lite-Chat.IQ2_XXS.gguf	IQ2_XXS	5.64GB	✅ Available	🟢 Yes	📦 No
DeepSeek-V2-Lite-Chat.IQ1_M.gguf	IQ1_M	5.24GB	✅ Available	🟢 Yes	📦 No
DeepSeek-V2-Lite-Chat.IQ1_S.gguf	IQ1_S	4.99GB	✅ Available	🟢 Yes	📦 No

🔧 Technical Details

Original Model: deepseek-ai/DeepSeek-V2-Lite-Chat
Original dtype: BF16 (bfloat16)
Quantized by: llama.cpp fork PR 7519
IMatrix dataset: here

📄 FAQ

Why is the IMatrix not applied everywhere?

According to this investigation, it appears that lower quantizations are the only ones that benefit from the imatrix input (as per hellaswag results).

How do I merge a split GGUF?

Make sure you have gguf-split available
- To get hold of gguf-split, navigate to https://github.com/ggerganov/llama.cpp/releases
- Download the appropriate zip for your system from the latest release
- Unzip the archive and you should be able to find gguf-split
Locate your GGUF chunks folder (ex: DeepSeek-V2-Lite-Chat.Q8_0)
Run gguf-split --merge DeepSeek-V2-Lite-Chat.Q8_0/DeepSeek-V2-Lite-Chat.Q8_0-00001-of-XXXXX.gguf DeepSeek-V2-Lite-Chat.Q8_0.gguf
- Make sure to point gguf-split to the first chunk of the split.

Got a suggestion? Ping me @legraphista!

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご