GEITje-7B-chat-GPTQ Open-Source Dutch Dialogue Model - Free Deployment to Support Chatting Tasks

Geitje 7B Chat GPTQ

Developed by TheBloke

GEITje-7B-chat is a Dutch conversation model based on the Mistral architecture, specifically optimized for chat and dialogue tasks.

Large Language Model

Transformers

OtherOpen Source License:Apache-2.0 #Dutch conversation #7B parameter scale #Mistral architecture

Downloads 21

Release Time : 12/19/2023

Model Overview

This model is a 7B-parameter Dutch conversation model based on the Mistral architecture, trained on the no_robots_nl and ultrachat_10k_nl datasets, suitable for Dutch dialogue generation tasks.

Model Features

Dutch language optimization

Specifically trained and optimized for Dutch dialogue tasks

Conversational ability

Focused on generating natural and fluent dialogue responses

Open-source license

Uses Apache 2.0 license, allowing commercial use

Model Capabilities

Dutch dialogue generation

Chatbot development

Natural language understanding

Use Cases

Customer service

Dutch customer service bot

Automated customer service system for the Dutch market

Provides smooth and natural Dutch customer service conversations

Education

Dutch language learning assistant

Helps learners practice Dutch conversations

Provides a natural Dutch conversation practice environment

🚀 Geitje 7B Chat - GPTQ

This repository provides GPTQ model files for Geitje 7B Chat, offering multiple quantisation options for different hardware and requirements.

🚀 Quick Start

Downloading the Model

In text-generation-webui

To download from the main branch, enter TheBloke/GEITje-7B-chat-GPTQ in the "Download model" box.
To download from another branch, add :branchname to the end of the download name, e.g., TheBloke/GEITje-7B-chat-GPTQ:gptq-4bit-32g-actorder_True

From the command line

Install the huggingface-hub Python library:

pip3 install huggingface-hub

To download the main branch to a folder called GEITje-7B-chat-GPTQ:

mkdir GEITje-7B-chat-GPTQ
huggingface-cli download TheBloke/GEITje-7B-chat-GPTQ --local-dir GEITje-7B-chat-GPTQ --local-dir-use-symlinks False

To download from a different branch, add the --revision parameter:

mkdir GEITje-7B-chat-GPTQ
huggingface-cli download TheBloke/GEITje-7B-chat-GPTQ --revision gptq-4bit-32g-actorder_True --local-dir GEITje-7B-chat-GPTQ --local-dir-use-symlinks False

Using the Model in text-generation-webui

Ensure you're using the latest version of text-generation-webui.
Click the Model tab.
Under Download custom model or LoRA, enter TheBloke/GEITje-7B-chat-GPTQ.
- To download from a specific branch, enter, for example, TheBloke/GEITje-7B-chat-GPTQ:gptq-4bit-32g-actorder_True
- See the "Provided files, and GPTQ parameters" section for the list of branches for each option.
Click Download.
Once the model has finished downloading, it will say "Done".
In the top left, click the refresh icon next to Model.
In the Model dropdown, choose the model you just downloaded: GEITje-7B-chat-GPTQ
The model will automatically load and is now ready for use!
If you want any custom settings, set them and then click Save settings for this model followed by Reload the Model in the top right.
- Note that you do not need to and should not set manual GPTQ parameters anymore. These are set automatically from the file quantize_config.json.
Once you're ready, click the Text Generation tab and enter a prompt to get started!

Serving the Model from Text Generation Inference (TGI)

It's recommended to use TGI version 1.1.0 or later. The official Docker container is: ghcr.io/huggingface/text-generation-inference:1.1.0

Example Docker parameters:

--model-id TheBloke/GEITje-7B-chat-GPTQ --port 3000 --quant

✨ Features

Multiple GPTQ parameter permutations are provided, allowing users to choose the best option for their hardware and requirements.
The model supports various inference servers/webuis, including text-generation-webui, KoboldAI United, LoLLMS Web UI, and Hugging Face Text Generation Inference (TGI).

📦 Installation

Prerequisites

For GPTQ models, Linux (NVidia/AMD) and Windows (NVidia only) are currently supported. macOS users should use GGUF models.
Install the necessary libraries as described in the "Quick Start" section.

📚 Documentation

Model Information

Property	Details
Model Type	mistral
Base Model	Rijgersberg/GEITje-7B-chat
Training Data	Rijgersberg/no_robots_nl, Rijgersberg/ultrachat_10k_nl
Model Creator	Edwin Rijgersberg
Quantized By	TheBloke
License	apache-2.0

Prompt Template

<|user|>
{prompt}
<|assistant|>

Provided files, and GPTQ parameters

Multiple quantisation parameters are provided, to allow you to choose the best one for your hardware and requirements.

Each separate quant is in a different branch. See below for instructions on fetching from different branches.

Most GPTQ files are made with AutoGPTQ. Mistral models are currently made with Transformers.

Explanation of GPTQ parameters

Bits: The bit size of the quantised model.
GS: GPTQ group size. Higher numbers use less VRAM, but have lower quantisation accuracy. "None" is the lowest possible value.
Act Order: True or False. Also known as desc_act. True results in better quantisation accuracy. Some GPTQ clients have had issues with models that use Act Order plus Group Size, but this is generally resolved now.
Damp %: A GPTQ parameter that affects how samples are processed for quantisation. 0.01 is default, but 0.1 results in slightly better accuracy.
GPTQ dataset: The calibration dataset used during quantisation. Using a dataset more appropriate to the model's training can improve quantisation accuracy. Note that the GPTQ calibration dataset is not the same as the dataset used to train the model - please refer to the original model repo for details of the training dataset(s).
Sequence Length: The length of the dataset sequences used for quantisation. Ideally this is the same as the model sequence length. For some very long sequence models (16+K), a lower sequence length may have to be used. Note that a lower sequence length does not limit the sequence length of the quantised model. It only impacts the quantisation accuracy on longer inference sequences.
ExLlama Compatibility: Whether this file can be loaded with ExLlama, which currently only supports Llama and Mistral models in 4-bit.

Branch	Bits	GS	Act Order	Damp %	GPTQ Dataset	Seq Len	Size	ExLlama	Desc
main	4	128	Yes	0.1	Dolly 15K Dutch	4096	4.16 GB	Yes	4-bit, with Act Order and group size 128g. Uses even less VRAM than 64g, but with slightly lower accuracy.
gptq-4bit-32g-actorder_True	4	32	Yes	0.1	Dolly 15K Dutch	4096	4.57 GB	Yes	4-bit, with Act Order and group size 32g. Gives highest possible inference quality, with maximum VRAM usage.
gptq-8bit--1g-actorder_True	8	None	Yes	0.1	Dolly 15K Dutch	4096	7.52 GB	No	8-bit, with Act Order. No group size, to lower VRAM requirements.
gptq-8bit-128g-actorder_True	8	128	Yes	0.1	Dolly 15K Dutch	4096	7.68 GB	No	8-bit, with group size 128g for higher inference quality and with Act Order for even higher accuracy.
gptq-8bit-32g-actorder_True	8	32	Yes	0.1	Dolly 15K Dutch	4096	8.17 GB	No	8-bit, with group size 32g and Act Order for maximum inference quality.
gptq-4bit-64g-actorder_True	4	64	Yes	0.1	Dolly 15K Dutch	4096	4.29 GB	Yes	4-bit, with Act Order and group size 64g. Uses less VRAM than 32g, but with slightly lower accuracy.

Repositories available

🔧 Technical Details

Quantisation

These files were quantised using hardware kindly provided by Massed Compute.

Known compatible clients / servers

GPTQ models are currently supported on Linux (NVidia/AMD) and Windows (NVidia only). macOS users: please use GGUF models.

These GPTQ models are known to work in the following inference servers/webuis:

This may not be a complete list; if you know of others, please let me know!

📄 License

This model is licensed under the apache-2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご