Model Overview
Model Features
Model Capabilities
Use Cases
đ Geitje 7B Chat - GPTQ
This repository provides GPTQ model files for Geitje 7B Chat, offering multiple quantisation options for different hardware and requirements.
đ Quick Start
Downloading the Model
In text-generation-webui
- To download from the
main
branch, enterTheBloke/GEITje-7B-chat-GPTQ
in the "Download model" box. - To download from another branch, add
:branchname
to the end of the download name, e.g.,TheBloke/GEITje-7B-chat-GPTQ:gptq-4bit-32g-actorder_True
From the command line
- Install the
huggingface-hub
Python library:
pip3 install huggingface-hub
- To download the
main
branch to a folder calledGEITje-7B-chat-GPTQ
:
mkdir GEITje-7B-chat-GPTQ
huggingface-cli download TheBloke/GEITje-7B-chat-GPTQ --local-dir GEITje-7B-chat-GPTQ --local-dir-use-symlinks False
- To download from a different branch, add the
--revision
parameter:
mkdir GEITje-7B-chat-GPTQ
huggingface-cli download TheBloke/GEITje-7B-chat-GPTQ --revision gptq-4bit-32g-actorder_True --local-dir GEITje-7B-chat-GPTQ --local-dir-use-symlinks False
Using the Model in text-generation-webui
- Ensure you're using the latest version of text-generation-webui.
- Click the Model tab.
- Under Download custom model or LoRA, enter
TheBloke/GEITje-7B-chat-GPTQ
.- To download from a specific branch, enter, for example,
TheBloke/GEITje-7B-chat-GPTQ:gptq-4bit-32g-actorder_True
- See the "Provided files, and GPTQ parameters" section for the list of branches for each option.
- To download from a specific branch, enter, for example,
- Click Download.
- Once the model has finished downloading, it will say "Done".
- In the top left, click the refresh icon next to Model.
- In the Model dropdown, choose the model you just downloaded:
GEITje-7B-chat-GPTQ
- The model will automatically load and is now ready for use!
- If you want any custom settings, set them and then click Save settings for this model followed by Reload the Model in the top right.
- Note that you do not need to and should not set manual GPTQ parameters anymore. These are set automatically from the file
quantize_config.json
.
- Note that you do not need to and should not set manual GPTQ parameters anymore. These are set automatically from the file
- Once you're ready, click the Text Generation tab and enter a prompt to get started!
Serving the Model from Text Generation Inference (TGI)
It's recommended to use TGI version 1.1.0 or later. The official Docker container is: ghcr.io/huggingface/text-generation-inference:1.1.0
Example Docker parameters:
--model-id TheBloke/GEITje-7B-chat-GPTQ --port 3000 --quant
⨠Features
- Multiple GPTQ parameter permutations are provided, allowing users to choose the best option for their hardware and requirements.
- The model supports various inference servers/webuis, including text-generation-webui, KoboldAI United, LoLLMS Web UI, and Hugging Face Text Generation Inference (TGI).
đĻ Installation
Prerequisites
- For GPTQ models, Linux (NVidia/AMD) and Windows (NVidia only) are currently supported. macOS users should use GGUF models.
- Install the necessary libraries as described in the "Quick Start" section.
đ Documentation
Model Information
Property | Details |
---|---|
Model Type | mistral |
Base Model | Rijgersberg/GEITje-7B-chat |
Training Data | Rijgersberg/no_robots_nl, Rijgersberg/ultrachat_10k_nl |
Model Creator | Edwin Rijgersberg |
Quantized By | TheBloke |
License | apache-2.0 |
Prompt Template
<|user|>
{prompt}
<|assistant|>
Provided files, and GPTQ parameters
Multiple quantisation parameters are provided, to allow you to choose the best one for your hardware and requirements.
Each separate quant is in a different branch. See below for instructions on fetching from different branches.
Most GPTQ files are made with AutoGPTQ. Mistral models are currently made with Transformers.
Explanation of GPTQ parameters
- Bits: The bit size of the quantised model.
- GS: GPTQ group size. Higher numbers use less VRAM, but have lower quantisation accuracy. "None" is the lowest possible value.
- Act Order: True or False. Also known as
desc_act
. True results in better quantisation accuracy. Some GPTQ clients have had issues with models that use Act Order plus Group Size, but this is generally resolved now. - Damp %: A GPTQ parameter that affects how samples are processed for quantisation. 0.01 is default, but 0.1 results in slightly better accuracy.
- GPTQ dataset: The calibration dataset used during quantisation. Using a dataset more appropriate to the model's training can improve quantisation accuracy. Note that the GPTQ calibration dataset is not the same as the dataset used to train the model - please refer to the original model repo for details of the training dataset(s).
- Sequence Length: The length of the dataset sequences used for quantisation. Ideally this is the same as the model sequence length. For some very long sequence models (16+K), a lower sequence length may have to be used. Note that a lower sequence length does not limit the sequence length of the quantised model. It only impacts the quantisation accuracy on longer inference sequences.
- ExLlama Compatibility: Whether this file can be loaded with ExLlama, which currently only supports Llama and Mistral models in 4-bit.
Branch | Bits | GS | Act Order | Damp % | GPTQ Dataset | Seq Len | Size | ExLlama | Desc |
---|---|---|---|---|---|---|---|---|---|
main | 4 | 128 | Yes | 0.1 | Dolly 15K Dutch | 4096 | 4.16 GB | Yes | 4-bit, with Act Order and group size 128g. Uses even less VRAM than 64g, but with slightly lower accuracy. |
gptq-4bit-32g-actorder_True | 4 | 32 | Yes | 0.1 | Dolly 15K Dutch | 4096 | 4.57 GB | Yes | 4-bit, with Act Order and group size 32g. Gives highest possible inference quality, with maximum VRAM usage. |
gptq-8bit--1g-actorder_True | 8 | None | Yes | 0.1 | Dolly 15K Dutch | 4096 | 7.52 GB | No | 8-bit, with Act Order. No group size, to lower VRAM requirements. |
gptq-8bit-128g-actorder_True | 8 | 128 | Yes | 0.1 | Dolly 15K Dutch | 4096 | 7.68 GB | No | 8-bit, with group size 128g for higher inference quality and with Act Order for even higher accuracy. |
gptq-8bit-32g-actorder_True | 8 | 32 | Yes | 0.1 | Dolly 15K Dutch | 4096 | 8.17 GB | No | 8-bit, with group size 32g and Act Order for maximum inference quality. |
gptq-4bit-64g-actorder_True | 4 | 64 | Yes | 0.1 | Dolly 15K Dutch | 4096 | 4.29 GB | Yes | 4-bit, with Act Order and group size 64g. Uses less VRAM than 32g, but with slightly lower accuracy. |
Repositories available
- AWQ model(s) for GPU inference.
- GPTQ models for GPU inference, with multiple quantisation parameter options.
- 2, 3, 4, 5, 6 and 8-bit GGUF models for CPU+GPU inference
- Edwin Rijgersberg's original unquantised fp16 model in pytorch format, for GPU inference and for further conversions
đ§ Technical Details
Quantisation
These files were quantised using hardware kindly provided by Massed Compute.
Known compatible clients / servers
GPTQ models are currently supported on Linux (NVidia/AMD) and Windows (NVidia only). macOS users: please use GGUF models.
These GPTQ models are known to work in the following inference servers/webuis:
This may not be a complete list; if you know of others, please let me know!
đ License
This model is licensed under the apache-2.0 license.

