đ Llamacpp imatrix Quantizations of Dans-PersonalityEngine-V1.3.0-12b by PocketDoc
This project offers quantized versions of the Dans-PersonalityEngine-V1.3.0-12b model, leveraging the llama.cpp framework. It provides various quantization types to meet different performance and quality requirements, enabling users to run the model efficiently on different hardware platforms.
đ Quick Start
Quantization Details
The project uses llama.cpp release b5466 for quantization. The original model can be found at https://huggingface.co/PocketDoc/Dans-PersonalityEngine-V1.3.0-12b. All quantizations are made using the imatrix option with a dataset from here.
Running the Model
- Using LM Studio: You can run the quantized models in LM Studio.
- Using llama.cpp: Run the models directly with llama.cpp, or any other llama.cpp - based project.
⨠Features
Supported Languages
- en, ar, de, fr, es, hi, pt, ja, ko
Tags
- general - purpose, roleplay, storywriting, chemistry, biology, code, climate, axolotl, text - generation - inference, finetune, legal, medical, finance
Datasets
The model is trained on a wide range of datasets, including:
- PocketDoc/Dans-Prosemaxx-RP
- PocketDoc/Dans-Personamaxx-Logs-2
- ... (and many others as listed in the original document)
Base Model
- Base model: PocketDoc/Dans-PersonalityEngine-V1.3.0-12b
- Thumbnail: https://huggingface.co/PocketDoc/Dans-PersonalityEngine-V1.3.0-12b/resolve/main/resources/pe.png
- Base model relation: quantized
- License: apache - 2.0
đĻ Installation
Prompt Format
The prompt format for the model is as follows:
[gMASK]<sop><|system|>{system_prompt}<|endoftext|><|user|>{prompt}<|endoftext|><|assistant|>
Downloading Files
You can download specific files from the following table:
Embed/Output Weights
Some of the quantizations (Q3_K_XL, Q4_K_L etc) use the standard quantization method, with the embeddings and output weights quantized to Q8_0 instead of the default values.
Downloading using huggingface - cli
Click to view download instructions
First, make sure you have hugginface - cli installed:
pip install -U "huggingface_hub[cli]"
Then, you can target the specific file you want:
huggingface-cli download bartowski/PocketDoc_Dans-PersonalityEngine-V1.3.0-12b-GGUF --include "PocketDoc_Dans-PersonalityEngine-V1.3.0-12b-Q4_K_M.gguf" --local-dir ./
If the model is bigger than 50GB, it will have been split into multiple files. To download them all to a local folder, run:
huggingface-cli download bartowski/PocketDoc_Dans-PersonalityEngine-V1.3.0-12b-GGUF --include "PocketDoc_Dans-PersonalityEngine-V1.3.0-12b-Q8_0/*" --local-dir ./
You can either specify a new local - dir (PocketDoc_Dans-PersonalityEngine-V1.3.0-12b-Q8_0) or download them all in place (./)
đ§ Technical Details
ARM/AVX Information
Previously, Q4_0_4_4/4_8/8_8 were downloaded, and their weights were interleaved in memory to improve performance on ARM and AVX machines by loading more data in one pass.
Now, there is "online repacking" for weights. Details can be found in this PR. If you use Q4_0 and your hardware would benefit from repacking weights, it will do it automatically on the fly.
As of llama.cpp build b4282, you cannot run the Q4_0_X_X files and need to use Q4_0 instead.
Additionally, if you want slightly better quality, you can use IQ4_NL thanks to this PR, which will also repack the weights for ARM (only the 4_4 for now). The loading time may be slower, but it will result in an overall speed increase.
Click to view Q4_0_X_X information (deprecated)
I'm keeping this section to show the potential theoretical uplift in performance from using the Q4
đ License
The project is licensed under the apache - 2.0 license.