đ openchat-3.6-8b-20240522-IMat-GGUF
Llama.cpp imatrix quantization of openchat/openchat-3.6-8b-20240522
This project offers a quantized version of the openchat/openchat-3.6-8b-20240522
model using the llama.cpp imatrix quantization method. It provides various quantization types in the GGUF format, making it suitable for different resource requirements.
Original Model: openchat/openchat-3.6-8b-20240522
Original dtype: BF16
(bfloat16
)
Quantized by: llama.cpp b3006
IMatrix dataset: here
⨠Features
- Multiple Quantization Types: Offers a wide range of quantization types, including IMatrix and common quantizations, to meet different resource and performance needs.
- Easy Download: Supports downloading using the
huggingface-cli
, with instructions for handling split model files.
- Inference Templates: Provides simple chat templates and chat templates with system prompts for inference.
đĻ Installation
If you do not have huggingface-cli
installed, you can install it using the following command:
pip install -U "huggingface_hub[cli]"
đģ Usage Examples
Downloading using huggingface-cli
To download a specific file:
huggingface-cli download legraphista/openchat-3.6-8b-20240522-IMat-GGUF --include "openchat-3.6-8b-20240522.Q8_0.gguf" --local-dir ./
If the model file is split into multiple files, you can download them all to a local folder:
huggingface-cli download legraphista/openchat-3.6-8b-20240522-IMat-GGUF --include "openchat-3.6-8b-20240522.Q8_0/*" --local-dir openchat-3.6-8b-20240522.Q8_0
Inference
Simple chat template
<|begin_of_text|><|start_header_id|>GPT4 Correct User<|end_header_id|>
Can you provide ways to eat combinations of bananas and dragonfruits?<|eot_id|><|start_header_id|>GPT4 Correct Assistant<|end_header_id|>
Sure! Here are some ways to eat bananas and dragonfruits together:
1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey.
2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey.<|eot_id|><|start_header_id|>GPT4 Correct User<|end_header_id|>
What about solving an 2x + 3 = 7 equation?<|eot_id|>
Chat template with system prompt
<|begin_of_text|><|start_header_id|>System<|end_header_id|>
You are a helpful AI.<|eot_id|><|start_header_id|>GPT4 Correct User<|end_header_id|>
Can you provide ways to eat combinations of bananas and dragonfruits?<|eot_id|><|start_header_id|>GPT4 Correct Assistant<|end_header_id|>
Sure! Here are some ways to eat bananas and dragonfruits together:
1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey.
2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey.<|eot_id|><|start_header_id|>GPT4 Correct User<|end_header_id|>
What about solving an 2x + 3 = 7 equation?<|eot_id|>
Llama.cpp
llama.cpp/main -m openchat-3.6-8b-20240522.Q8_0.gguf --color -i -p "prompt here (according to the chat template)"
đ Documentation
Files
IMatrix
Status: â
Available
Link: here
Common Quants
All Quants
đ§ Technical Details
According to this investigation, it appears that lower quantizations are the only ones that benefit from the IMatrix input (as per hellaswag results).
đ License
This project uses the llama3
license.
FAQ
Why is the IMatrix not applied everywhere?
According to this investigation, it appears that lower quantizations are the only ones that benefit from the IMatrix input (as per hellaswag results).
How do I merge a split GGUF?
- Make sure you have
gguf-split
available
- To get hold of
gguf-split
, navigate to https://github.com/ggerganov/llama.cpp/releases
- Download the appropriate zip for your system from the latest release
- Unzip the archive and you should be able to find
gguf-split
- Locate your GGUF chunks folder (ex:
openchat-3.6-8b-20240522.Q8_0
)
- Run
gguf-split --merge openchat-3.6-8b-20240522.Q8_0/openchat-3.6-8b-20240522.Q8_0-00001-of-XXXXX.gguf openchat-3.6-8b-20240522.Q8_0.gguf
- Make sure to point
gguf-split
to the first chunk of the split.
Got a suggestion? Ping me @legraphista!