๐ DeepSeek-V2-Lite-Chat-IMat-GGUF
Llama.cpp imatrix quantization of deepseek-ai/DeepSeek-V2-Lite-Chat
This project offers the llama.cpp imatrix quantization of the deepseek-ai/DeepSeek-V2-Lite-Chat model, providing various quantized versions for different usage scenarios.
๐ Quick Start
Prerequisites
Ensure you have huggingface-cli
installed. You can install it using the following command:
pip install -U "huggingface_hub[cli]"
Downloading the Model
To download a specific file, use the following command:
huggingface-cli download legraphista/DeepSeek-V2-Lite-Chat-IMat-GGUF --include "DeepSeek-V2-Lite-Chat.Q8_0.gguf" --local-dir ./
If the model is larger than 50GB and split into multiple files, download all parts to a local folder:
huggingface-cli download legraphista/DeepSeek-V2-Lite-Chat-IMat-GGUF --include "DeepSeek-V2-Lite-Chat.Q8_0/*" --local-dir DeepSeek-V2-Lite-Chat.Q8_0
# see FAQ for merging GGUF's
โจ Features
- Multiple Quantized Versions: Offers a variety of quantized versions, including different bit depths and quantization types, to meet diverse performance and storage requirements.
- IMatrix Quantization: Some versions utilize IMatrix quantization, which may improve performance in certain scenarios.
๐ฆ Installation
The installation mainly involves downloading the model files using huggingface-cli
as described in the Quick Start section.
๐ป Usage Examples
Basic Usage
Simple chat template
<๏ฝbeginโofโsentence๏ฝ>User: {user_message_1}
Assistant: {assistant_message_1}<๏ฝendโofโsentence๏ฝ>User: {user_message_2}
Assistant:
Chat template with system prompt
<๏ฝbeginโofโsentence๏ฝ>{system_message}
User: {user_message_1}
Assistant: {assistant_message_1}<๏ฝendโofโsentence๏ฝ>User: {user_message_2}
Assistant:
Advanced Usage
Llama.cpp
llama.cpp/main -m DeepSeek-V2-Lite-Chat.Q8_0.gguf --color -i -p "prompt here (according to the chat template)"
๐ Documentation
Files
IMatrix
Status: โ
Available
Link: here
Common Quants
All Quants
๐ง Technical Details
Original Model: deepseek-ai/DeepSeek-V2-Lite-Chat
Original dtype: BF16
(bfloat16
)
Quantized by: llama.cpp fork PR 7519
IMatrix dataset: here
๐ FAQ
Why is the IMatrix not applied everywhere?
According to this investigation, it appears that lower quantizations are the only ones that benefit from the imatrix input (as per hellaswag results).
How do I merge a split GGUF?
- Make sure you have
gguf-split
available
- To get hold of
gguf-split
, navigate to https://github.com/ggerganov/llama.cpp/releases
- Download the appropriate zip for your system from the latest release
- Unzip the archive and you should be able to find
gguf-split
- Locate your GGUF chunks folder (ex:
DeepSeek-V2-Lite-Chat.Q8_0
)
- Run
gguf-split --merge DeepSeek-V2-Lite-Chat.Q8_0/DeepSeek-V2-Lite-Chat.Q8_0-00001-of-XXXXX.gguf DeepSeek-V2-Lite-Chat.Q8_0.gguf
- Make sure to point
gguf-split
to the first chunk of the split.
Got a suggestion? Ping me @legraphista!