Model Overview
Model Features
Model Capabilities
Use Cases
🚀 Llamacpp imatrix quantified version of Jedi-3B-1080p by xlangai
This project is a quantified version of xlangai's Jedi-3B-1080p model, which uses specific tools and datasets for quantification processing and can run in multiple environments, providing users with rich choices for different hardware conditions and needs.
🚀 Quick start
This project uses<a ref=“ https://github.com/ggerganov/llama.cpp/ Llama. cppRelease version<a ref= https://github.com/ggerganov/llama.cpp/releases/tag/b5524 Quantify b5524. Original model address: https://huggingface.co/xlangai/Jedi-3B-1080p All quantitative models use the imatrix option and adopt data from [here]( https://gist.github.com/bartowski1182/eb213dccb3571f863da82e99418f81e8 )The dataset. You can find it in LM Studio( https://lmstudio.ai/ )Running these quantitative models in the middle can also be done directly using [llama. cpp]( https://github.com/ggerganov/llama.cpp )Or any Llama.cpp based project to run.
✨ Main characteristics
-* * Multiple quantization types * : Provides a wide range of quantization types to choose from, such as bf16, Q8_0, Q6_KL, etc., to meet different performance and quality requirements. - * Specific weight processing * *: Some quantization models (such as Q3_KXL, Q4_KL, etc.) use special quantization methods to quantize the embedding and output weights into Q8_0 to improve performance. -Online repackaging: Some quantitative models support online repackaging and can automatically optimize performance based on hardware.
📦 installation guide
###Download using hugginface cli Firstly, ensure that you have installed huggingface cli:
pip install -U "huggingface_hub[cli]"
Then, you can specify the specific files to download:
huggingface-cli download bartowski/xlangai_Jedi-3B-1080p-GGUF --include "xlangai_Jedi-3B-1080p-Q4_K_M.gguf" --local-dir ./
If the model size exceeds 50GB, it will be split into multiple files. To download them all to your local folder, please run:
huggingface-cli download bartowski/xlangai_Jedi-3B-1080p-GGUF --include "xlangai_Jedi-3B-1080p-Q8_0/*" --local-dir ./
You can specify a new local directory (such as xlangai_Jedi-3B-1080p-Q8-0), or download them all to the current directory (./).
💻 Example usage
###Prompt format
<|im_start|>system
{system_prompt}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
📚 Detailed documentation
###Download file selection
File Name | Quantization Type | File Size | Segmentation Status | Description |
---|---|---|---|---|
[Jedi-3B-1080p-bf16.gguf]( https://huggingface.co/bartowski/xlangai_Jedi-3B-1080p-GGUF/blob/main/xlangai_Jedi-3B-1080p-bf16.gguf ) | BF16 | 6.18GB | false | Complete BF16 weight. |
[Jedi-3B-1080p-Q8_0.gguf]( https://huggingface.co/bartowski/xlangai_Jedi-3B-1080p-GGUF/blob/main/xlangai_Jedi-3B-1080p-Q8_0.gguf ) | Q8-0 | 3.29GB | false | Extremely high quality, usually not required, but for maximum available quantization. |
[Jedi-3B-1080p-Q6_K_L.gguf]( https://huggingface.co/bartowski/xlangai_Jedi-3B-1080p-GGUF/blob/main/xlangai_Jedi-3B-1080p-Q6_K_L.gguf ) | Q6_K_L | 2.61GB | false | Quantify the embedding and output weights to Q8_0. Very high quality, almost perfect, * recommended *. |
[Jedi-3B-1080p-Q6_K.gguf]( https://huggingface.co/bartowski/xlangai_Jedi-3B-1080p-GGUF/blob/main/xlangai_Jedi-3B-1080p-Q6_K.gguf ) | Q6_K | 2.54GB | false | Very high quality, almost perfect, * recommended *. |
[Jedi-3B-1080p-Q5_K_L.gguf]( https://huggingface.co/bartowski/xlangai_Jedi-3B-1080p-GGUF/blob/main/xlangai_Jedi-3B-1080p-Q5_K_L.gguf ) | Q5_K_L | 2.30GB | false | Quantify the embedding and output weights to Q8_0. High quality, * recommended *. |
[Jedi-3B-1080p-Q5_K_M.gguf]( https://huggingface.co/bartowski/xlangai_Jedi-3B-1080p-GGUF/blob/main/xlangai_Jedi-3B-1080p-Q5_K_M.gguf ) | Q5_K_M | 2.22GB | false | High quality, * recommended *. |
[Jedi-3B-1080p-Q5_K_S.gguf]( https://huggingface.co/bartowski/xlangai_Jedi-3B-1080p-GGUF/blob/main/xlangai_Jedi-3B-1080p-Q5_K_S.gguf ) | Q5_K_S | 2.17GB | false | High quality, * recommended *. |
[Jedi-3B-1080p-Q4_K_L.gguf]( https://huggingface.co/bartowski/xlangai_Jedi-3B-1080p-GGUF/blob/main/xlangai_Jedi-3B-1080p-Q4_K_L.gguf ) | Q4_K_L | 2.01GB | false | Quantify the embedding and output weights to Q8_0. Good quality, * recommended *. |
[Jedi-3B-1080p-Q4_1.gguf]( https://huggingface.co/bartowski/xlangai_Jedi-3B-1080p-GGUF/blob/main/xlangai_Jedi-3B-1080p-Q4_1.gguf ) | Q4_1 | 2.00GB | false | Old format, performance similar to Q4_K_S, but with improved token count per watt on Apple silicon chips. |
[Jedi-3B-1080p-Q4_K_M.gguf]( https://huggingface.co/bartowski/xlangai_Jedi-3B-1080p-GGUF/blob/main/xlangai_Jedi-3B-1080p-Q4_K_M.gguf ) | Q4_K_M | 1.93GB | false | Good quality, default size for most use cases, * recommended *. |
[Jedi-3B-1080p-Q4_K_S.gguf]( https://huggingface.co/bartowski/xlangai_Jedi-3B-1080p-GGUF/blob/main/xlangai_Jedi-3B-1080p-Q4_K_S.gguf ) | Q4_K_S | 1.83GB | false | Slightly lower quality, but saves more space, * recommended *. |
[Jedi-3B-1080p-Q4_0.gguf]( https://huggingface.co/bartowski/xlangai_Jedi-3B-1080p-GGUF/blob/main/xlangai_Jedi-3B-1080p-Q4_0.gguf ) | Q4-0 | 1.83GB | false | Old format, supports online repackaging for ARM and AVX CPU inference. |
[Jedi-3B-1080p-IQ4_NL.gguf]( https://huggingface.co/bartowski/xlangai_Jedi-3B-1080p-GGUF/blob/main/xlangai_Jedi-3B-1080p-IQ4_NL.gguf ) | IQ4_SL | 1.83GB | false | Similar to IQ4_XS, but slightly larger. Support online repackaging for ARM CPU inference. |
[Jedi-3B-1080p-Q3_K_XL.gguf]( https://huggingface.co/bartowski/xlangai_Jedi-3B-1080p-GGUF/blob/main/xlangai_Jedi-3B-1080p-Q3_K_XL.gguf ) | Q3_KXL | 1.78GB | false | Quantify the embedding and output weights to Q8_0. Low quality but usable, suitable for low memory situations. |
[Jedi-3B-1080p-IQ4_XS.gguf]( https://huggingface.co/bartowski/xlangai_Jedi-3B-1080p-GGUF/blob/main/xlangai_Jedi-3B-1080p-IQ4_XS.gguf )IQ4_XS | 1.74GB | false | Good quality, smaller than Q4_K_S, similar performance, * recommended *. | |
[Jedi-3B-1080p-Q3_K_L.gguf]( https://huggingface.co/bartowski/xlangai_Jedi-3B-1080p-GGUF/blob/main/xlangai_Jedi-3B-1080p-Q3_K_L.gguf ) | Q3_K_L | 1.71GB | false | Low quality but usable, suitable for low memory situations. |
[Jedi-3B-1080p-Q3_K_M.gguf]( https://huggingface.co/bartowski/xlangai_Jedi-3B-1080p-GGUF/blob/main/xlangai_Jedi-3B-1080p-Q3_K_M.gguf ) | Q3_K_M | 1.59GB | false | Low quality. |
[Jedi-3B-1080p-IQ3_M.gguf]( https://huggingface.co/bartowski/xlangai_Jedi-3B-1080p-GGUF/blob/main/xlangai_Jedi-3B-1080p-IQ3_M.gguf )IQ3_S | 1.49GB | false | Low to medium quality, new method, performance comparable to Q3_K_M. | |
[Jedi-3B-1080p-Q3_K_S.gguf]( https://huggingface.co/bartowski/xlangai_Jedi-3B-1080p-GGUF/blob/main/xlangai_Jedi-3B-1080p-Q3_K_S.gguf ) | Q3_K_S | 1.45GB | false | Low quality, not recommended. |
[Jedi-3B-1080p-IQ3_XS.gguf]( https://huggingface.co/bartowski/xlangai_Jedi-3B-1080p-GGUF/blob/main/xlangai_Jedi-3B-1080p-IQ3_XS.gguf )IQ3_XS | 1.39GB | false | Low quality, new method, good performance, slightly better than Q3_K_S. | |
[Jedi-3B-1080p-Q2_K_L.gguf]( https://huggingface.co/bartowski/xlangai_Jedi-3B-1080p-GGUF/blob/main/xlangai_Jedi-3B-1080p-Q2_K_L.gguf ) | Q2_K_L | 1.35GB | false | Quantify the embedding and output weights to Q8_0. The quality is very low, but surprisingly usable. |
[Jedi-3B-1080p-IQ3_XXS.gguf]( https://huggingface.co/bartowski/xlangai_Jedi-3B-1080p-GGUF/blob/main/xlangai_Jedi-3B-1080p-IQ3_XXS.gguf ) | IQ3_XXS | 1.28GB | false | Low quality, new method, good performance, comparable to Q3 quantization. |
[Jedi-3B-1080p-Q2_K.gguf]( https://huggingface.co/bartowski/xlangai_Jedi-3B-1080p-GGUF/blob/main/xlangai_Jedi-3B-1080p-Q2_K.gguf ) | Q2_K | 1.27GB | false | Very low quality, but surprisingly usable. |
[Jedi-3B-1080p-IQ2_M.gguf]( https://huggingface.co/bartowski/xlangai_Jedi-3B-1080p-GGUF/blob/main/xlangai_Jedi-3B-1080p-IQ2_M.gguf ) | IQ_2M | 1.14GB | false | Relatively low quality, using the most advanced technology, surprisingly usable. |
###Embedding/outputting weights | ||||
Partial quantization models (such as Q3_KXL, Q4_KL, etc.) use standard quantization methods to quantize the embedding and output weights to Q8-0 instead of the usual default values. | ||||
###ARM/AVX Information | ||||
Previously, you would download Q4_0_4_4/4_8/8_8, and the weights of these models would be interleaved in memory to improve the performance of ARM and AVX machines by loading more data at once. | ||||
However, there is now a weight processing method called "online repackaging", as detailed in [this PR]( https://github.com/ggerganov/llama.cpp/pull/9921 )If you use Q4-0 and your hardware can benefit from heavy repackaging, it will automatically repackage in real-time. | ||||
Build version from llama. cpp [b4282]( https://github.com/ggerganov/llama.cpp/releases/tag/b4282 )At the beginning, you will not be able to run the Q4_0_X_X file and will need to use Q4_0 instead. | ||||
In addition, if you want slightly better quality, you can use IQ4-NL, thank you [this PR]( https://github.com/ggerganov/llama.cpp/pull/10541 )It will also repackage weights for ARM, but currently only supports 4_4. The loading time may be longer, but the overall speed will increase. | ||||
###Which file should I choose? | ||||
Firstly, you need to determine how large a model you can run. To do this, you need to know how much RAM and/or VRAM you have. | ||||
-If you want the model to run as fast as possible, you need to put the entire model into the GPU's memory. Choose a quantization model with a file size 1-2GB smaller than the total GPU memory. | ||||
-If you pursue the absolute highest quality, add up the system memory and GPU memory, and then choose a quantization model with a file size 1-2GB smaller than the sum. | ||||
Next, you need to decide whether to use "I-quant" or "K-quant". | ||||
-If you don't want to think too much, choose K-quant. The format of these models is "QX_K_X", such as Q5_K_M. | ||||
-If you want to delve deeper, you can check out this very useful feature chart: [llama. cpp feature matrix]( https://github.com/ggerganov/llama.cpp/wiki/Feature-matrix ). | ||||
Generally speaking, if your goal is quantification below Q4 and you are using cuBLAS (Nvidia) or rocBLAS (AMD), you should consider I-quant. The format of these models is IQX_X, such as IQ3_S. They are newer models that provide better performance at the same size. | ||||
These I-quant can also be used on CPUs, but they are slower than the corresponding K-quant, so you need to make a trade-off between speed and performance. |
🔧 Technical Details
###Quantitative tools Use<a ref=“ https://github.com/ggerganov/llama.cpp/ Llama. cppRelease version<a ref= https://github.com/ggerganov/llama.cpp/releases/tag/b5524 Quantify b5524. ###Quantitative dataset All quantitative models use the imatrix option and adopt data from [here]( https://gist.github.com/bartowski1182/eb213dccb3571f863da82e99418f81e8 )The dataset. ###Online repackaging Partial quantitative models support online repackaging, please refer to [this PR] for details( https://github.com/ggerganov/llama.cpp/pull/9921 ).
📄 permit
This project adopts Apache-2.0 license. ##Thank you Thank you to Kalomaze and Dampf for their assistance in creating the Imatrix calibration dataset. Thank you ZeroWOW for providing inspiration in embedding/outputting experiments. Thank you LM Studio for sponsoring this project. If you want to support my work, please visit my ko fi page: https://ko-fi.com/bartowski

