Model Overview
Model Features
Model Capabilities
Use Cases
🚀 Chronoboros 33B - GPTQ
This repository contains GPTQ model files for Chronoboros 33B, offering multiple quantisation options for different hardware and requirements.
📚 Documentation
Model Information
- Model creator: Henky!!
- Original model: Chronoboros 33B
Property | Details |
---|---|
Model Type | llama |
Base Model | Henk717/chronoboros-33B |
Quantized By | TheBloke |
License | other |
Repositories available
- AWQ model(s) for GPU inference.
- GPTQ models for GPU inference, with multiple quantisation parameter options.
- 2, 3, 4, 5, 6 and 8 - bit GGUF models for CPU+GPU inference
- Henky!!'s original unquantised fp16 model in pytorch format, for GPU inference and for further conversions
Prompt template: Alpaca
Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{prompt}
### Response:
Provided files and GPTQ parameters
Multiple quantisation parameters are provided, enabling you to select the most suitable one for your hardware and requirements. Each separate quant is in a different branch. See below for instructions on fetching from different branches.
All recent GPTQ files are created with AutoGPTQ, and all files in non - main branches are made with AutoGPTQ. Files in the main
branch which were uploaded before August 2023 were made with GPTQ - for - LLaMa.
Explanation of GPTQ parameters
- Bits: The bit size of the quantised model.
- GS: GPTQ group size. Higher numbers use less VRAM, but have lower quantisation accuracy. "None" is the lowest possible value.
- Act Order: True or False. Also known as
desc_act
. True results in better quantisation accuracy. Some GPTQ clients have had issues with models that use Act Order plus Group Size, but this is generally resolved now. - Damp %: A GPTQ parameter that affects how samples are processed for quantisation. 0.01 is default, but 0.1 results in slightly better accuracy.
- GPTQ dataset: The dataset used for quantisation. Using a dataset more appropriate to the model's training can improve quantisation accuracy. Note that the GPTQ dataset is not the same as the dataset used to train the model - please refer to the original model repo for details of the training dataset(s).
- Sequence Length: The length of the dataset sequences used for quantisation. Ideally this is the same as the model sequence length. For some very long sequence models (16+K), a lower sequence length may have to be used. Note that a lower sequence length does not limit the sequence length of the quantised model. It only impacts the quantisation accuracy on longer inference sequences.
- ExLlama Compatibility: Whether this file can be loaded with ExLlama, which currently only supports Llama models in 4 - bit.
Branch | Bits | GS | Act Order | Damp % | GPTQ Dataset | Seq Len | Size | ExLlama | Desc |
---|---|---|---|---|---|---|---|---|---|
main | 4 | None | Yes | 0.01 | wikitext | 2048 | 16.94 GB | Yes | 4 - bit, with Act Order. No group size, to lower VRAM requirements. |
gptq-4bit-32g-actorder_True | 4 | 32 | Yes | 0.01 | wikitext | 2048 | 19.44 GB | Yes | 4 - bit, with Act Order and group size 32g. Gives highest possible inference quality, with maximum VRAM usage. |
gptq-4bit-64g-actorder_True | 4 | 64 | Yes | 0.01 | wikitext | 2048 | 18.18 GB | Yes | 4 - bit, with Act Order and group size 64g. Uses less VRAM than 32g, but with slightly lower accuracy. |
gptq-4bit-128g-actorder_True | 4 | 128 | Yes | 0.01 | wikitext | 2048 | 17.55 GB | Yes | 4 - bit, with Act Order and group size 128g. Uses even less VRAM than 64g, but with slightly lower accuracy. |
gptq-8bit--1g-actorder_True | 8 | None | Yes | 0.01 | wikitext | 2048 | 32.99 GB | No | 8 - bit, with Act Order. No group size, to lower VRAM requirements. |
gptq-8bit-128g-actorder_False | 8 | 128 | No | 0.01 | wikitext | 2048 | 33.73 GB | No | 8 - bit, with group size 128g for higher inference quality and without Act Order to improve AutoGPTQ speed. |
gptq-3bit--1g-actorder_True | 3 | None | Yes | 0.01 | wikitext | 2048 | 12.92 GB | No | 3 - bit, with Act Order and no group size. Lowest possible VRAM requirements. May be lower quality than 3 - bit 128g. |
gptq-3bit-128g-actorder_False | 3 | 128 | No | 0.01 | wikitext | 2048 | 13.51 GB | No | 3 - bit, with group size 128g but no act - order. Slightly higher VRAM requirements than 3 - bit None. |
🚀 Quick Start
How to download from branches
- In text - generation - webui, you can add
:branch
to the end of the download name, egTheBloke/Chronoboros-33B-GPTQ:main
- With Git, you can clone a branch with:
git clone --single - branch --branch main https://huggingface.co/TheBloke/Chronoboros-33B-GPTQ
- In Python Transformers code, the branch is the
revision
parameter; see below.
How to easily download and use this model in [text - generation - webui](https://github.com/oobabooga/text - generation - webui)
Please ensure you're using the latest version of [text - generation - webui](https://github.com/oobabooga/text - generation - webui).
It is highly recommended to use the text - generation - webui one - click - installers unless you're certain you know how to make a manual install.
- Click the Model tab.
- Under Download custom model or LoRA, enter
TheBloke/Chronoboros-33B-GPTQ
.
- To download from a specific branch, enter for example
TheBloke/Chronoboros-33B-GPTQ:main
- see Provided Files above for the list of branches for each option.
- Click Download.
- The model will start downloading. Once it's finished it will say "Done".
- In the top left, click the refresh icon next to Model.
- In the Model dropdown, choose the model you just downloaded:
Chronoboros-33B-GPTQ
- The model will automatically load, and is now ready for use!
- If you want any custom settings, set them and then click Save settings for this model followed by Reload the Model in the top right.
- Note that you do not need to and should not set manual GPTQ parameters any more. These are set automatically from the file
quantize_config.json
.
- Once you're ready, click the Text Generation tab and enter a prompt to get started!
How to use this GPTQ model from Python code
Install the necessary packages
Requires: Transformers 4.32.0 or later, Optimum 1.12.0 or later, and AutoGPTQ 0.4.2 or later.
pip3 install transformers>=4.32.0 optimum>=1.12.0
pip3 install auto - gptq --extra - index - url https://huggingface.github.io/autogptq - index/whl/cu118/ # Use cu117 if on CUDA 11.7
If you have problems installing AutoGPTQ using the pre - built wheels, install it from source instead:
pip3 uninstall - y auto - gptq
git clone https://github.com/PanQiWei/AutoGPTQ
cd AutoGPTQ
pip3 install .
For CodeLlama models only: you must use Transformers 4.33.0 or later.
If 4.33.0 is not yet released when you read this, you will need to install Transformers from source:
pip3 uninstall - y transformers
pip3 install git+https://github.com/huggingface/transformers.git
You can then use the following code
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
model_name_or_path = "TheBloke/Chronoboros-33B-GPTQ"
# To use a different branch, change revision
# For example: revision="main"
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
device_map="auto",
trust_remote_code=False,
revision="main")
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
prompt = "Tell me about AI"
prompt_template=f'''Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{prompt}
### Response:
'''
print("\n\n*** Generate:")
input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.7, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512)
print(tokenizer.decode(output[0]))
# Inference can also be done using transformers' pipeline
print("*** Pipeline:")
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
max_new_tokens=512,
do_sample=True,
temperature=0.7,
top_p=0.95,
top_k=40,
repetition_penalty=1.1
)
print(pipe(prompt_template)[0]['generated_text'])
Compatibility
The files provided are tested to work with AutoGPTQ, both via Transformers and using AutoGPTQ directly. They should also work with Occ4m's GPTQ - for - LLaMa fork.
ExLlama is compatible with Llama models in 4 - bit. Please see the Provided Files table above for per - file compatibility.
[Huggingface Text Generation Inference (TGI)](https://github.com/huggingface/text - generation - inference) is compatible with all GPTQ models.
Discord
For further support, and discussions on these models and AI in general, join us at:
Thanks, and how to contribute
Thanks to the chirper.ai team!

