Athena-v4-GPTQ Open Source Large Language Model - Free Deployment, Supports Role-playing and General Scenarios

Athena V4 GPTQ

Developed by TheBloke

Athena v4 is an experimental large language model suitable for role-playing, emotional role-playing, and general scenarios. Uses Alpaca format prompt templates.

Large Language Model

Transformers

#Role-playing Optimization #Emotional Interaction Enhancement #Multi-model Fusion

Downloads 22

Release Time : 10/8/2023

Model Overview

Athena v4 is an experimental language model developed by IkariDev and Undi95, based on the fusion of multiple high-quality models, excelling in role-playing and general task processing.

Model Features

Multi-model Fusion

Combines the strengths of multiple high-quality models including Athena-v3, Xwin-LM, and PsyMedRP

Role-playing Optimization

Specially optimized for role-playing and emotional interaction capabilities

Alpaca Format Support

Uses standard Alpaca prompt templates for easy integration and usage

Model Capabilities

Text generation

Dialogue systems

Role-playing

Emotional interaction

Instruction following

Use Cases

Entertainment

Role-playing Games

Immersive interaction with players as game NPCs

Provides natural and fluent character dialogue experiences

Creative Writing

Story Generation

Generates coherent storylines based on prompts

Produces creative narrative content

🚀 Athena v4 - GPTQ

This repository provides GPTQ model files for Athena v4, offering multiple quantization options for different hardware and requirements.

🚀 Quick Start

If you want to quickly start using the Athena v4 - GPTQ model, you can follow the download and usage instructions below.

✨ Features

Multiple Quantization Options: Multiple GPTQ parameter permutations are provided, allowing you to choose the best one for your hardware and requirements.
Multiple Download Methods: Support downloading from different branches through text - generation - webui, command line, etc.
Easy to Use in text - generation - webui: You can easily download and use this model in [text - generation - webui](https://github.com/oobabooga/text - generation - webui).
Serving with TGI: It can be served from Text Generation Inference (TGI).

📦 Installation

In text - generation - webui

To download from the main branch, enter TheBloke/Athena - v4 - GPTQ in the "Download model" box. To download from another branch, add :branchname to the end of the download name, eg TheBloke/Athena - v4 - GPTQ:gptq - 4bit - 32g - actorder_True

From the command line

I recommend using the huggingface - hub Python library:

pip3 install huggingface - hub

To download the main branch to a folder called Athena - v4 - GPTQ:

mkdir Athena - v4 - GPTQ
huggingface - cli download TheBloke/Athena - v4 - GPTQ --local - dir Athena - v4 - GPTQ --local - dir - use - symlinks False

To download from a different branch, add the --revision parameter:

mkdir Athena - v4 - GPTQ
huggingface - cli download TheBloke/Athena - v4 - GPTQ --revision gptq - 4bit - 32g - actorder_True --local - dir Athena - v4 - GPTQ --local - dir - use - symlinks False

With `git` (not recommended)

To clone a specific branch with git, use a command like this:

git clone --single - branch --branch gptq - 4bit - 32g - actorder_True https://huggingface.co/TheBloke/Athena - v4 - GPTQ

💻 Usage Examples

Use in text - generation - webui

Click the Model tab.
Under Download custom model or LoRA, enter TheBloke/Athena - v4 - GPTQ.

To download from a specific branch, enter for example TheBloke/Athena - v4 - GPTQ:gptq - 4bit - 32g - actorder_True
see Provided Files below for the list of branches for each option.

Click Download.
The model will start downloading. Once it's finished it will say "Done".
In the top left, click the refresh icon next to Model.
In the Model dropdown, choose the model you just downloaded: Athena - v4 - GPTQ
The model will automatically load, and is now ready for use!
If you want any custom settings, set them and then click Save settings for this model followed by Reload the Model in the top right.

Note that you do not need to and should not set manual GPTQ parameters any more. These are set automatically from the file quantize_config.json.

Once you're ready, click the Text Generation tab and enter a prompt to get started!

Serving this model from Text Generation Inference (TGI)

It's recommended to use TGI version 1.1.0 or later. The official Docker container is: ghcr.io/huggingface/text - generation - inference:1.1.0

Example Docker parameters:

--model - id TheBloke/Athena - v4 - GPTQ --port 3000 --quantize awq --max - input - length 3696 --max - total - tokens 4096 --max - batch - prefill - tokens 4096

📚 Documentation

Model Information

Model creator: IkariDev + Undi95
Original model: [Athena v4](https://huggingface.co/IkariDev/Athena - v4)
Model type: llama
Prompt template: Alpaca

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{prompt}

### Response:

Repositories available

[AWQ model(s) for GPU inference.](https://huggingface.co/TheBloke/Athena - v4 - AWQ)
[GPTQ models for GPU inference, with multiple quantisation parameter options.](https://huggingface.co/TheBloke/Athena - v4 - GPTQ)
[2, 3, 4, 5, 6 and 8 - bit GGUF models for CPU+GPU inference](https://huggingface.co/TheBloke/Athena - v4 - GGUF)
[IkariDev + Undi95's original unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/IkariDev/Athena - v4)

Provided files, and GPTQ parameters

Multiple quantisation parameters are provided, to allow you to choose the best one for your hardware and requirements.

Each separate quant is in a different branch. See below for instructions on fetching from different branches.

Most GPTQ files are made with AutoGPTQ. Mistral models are currently made with Transformers.

Explanation of GPTQ parameters

Bits: The bit size of the quantised model.
GS: GPTQ group size. Higher numbers use less VRAM, but have lower quantisation accuracy. "None" is the lowest possible value.
Act Order: True or False. Also known as desc_act. True results in better quantisation accuracy. Some GPTQ clients have had issues with models that use Act Order plus Group Size, but this is generally resolved now.
Damp %: A GPTQ parameter that affects how samples are processed for quantisation. 0.01 is default, but 0.1 results in slightly better accuracy.
GPTQ dataset: The calibration dataset used during quantisation. Using a dataset more appropriate to the model's training can improve quantisation accuracy. Note that the GPTQ calibration dataset is not the same as the dataset used to train the model - please refer to the original model repo for details of the training dataset(s).
Sequence Length: The length of the dataset sequences used for quantisation. Ideally this is the same as the model sequence length. For some very long sequence models (16+K), a lower sequence length may have to be used. Note that a lower sequence length does not limit the sequence length of the quantised model. It only impacts the quantisation accuracy on longer inference sequences.
ExLlama Compatibility: Whether this file can be loaded with ExLlama, which currently only supports Llama models in 4 - bit.

Branch	Bits	GS	Act Order	Damp %	GPTQ Dataset	Seq Len	Size	ExLlama	Desc
[main](https://huggingface.co/TheBloke/Athena - v4 - GPTQ/tree/main)	4	128	Yes	0.1	[wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext - 2 - v1/test)	4096	7.26 GB	Yes	4 - bit, with Act Order and group size 128g. Uses even less VRAM than 64g, but with slightly lower accuracy.
[gptq - 4bit - 32g - actorder_True](https://huggingface.co/TheBloke/Athena - v4 - GPTQ/tree/gptq - 4bit - 32g - actorder_True)	4	32	Yes	0.1	[wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext - 2 - v1/test)	4096	8.00 GB	Yes	4 - bit, with Act Order and group size 32g. Gives highest possible inference quality, with maximum VRAM usage.
[gptq - 8bit--1g - actorder_True](https://huggingface.co/TheBloke/Athena - v4 - GPTQ/tree/gptq - 8bit--1g - actorder_True)	8	None	Yes	0.1	[wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext - 2 - v1/test)	4096	13.36 GB	No	8 - bit, with Act Order. No group size, to lower VRAM requirements.
[gptq - 8bit - 128g - actorder_True](https://huggingface.co/TheBloke/Athena - v4 - GPTQ/tree/gptq - 8bit - 128g - actorder_True)	8	128	Yes	0.1	[wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext - 2 - v1/test)	4096	13.65 GB	No	8 - bit, with group size 128g for higher inference quality and with Act Order for even higher accuracy.
[gptq - 8bit - 32g - actorder_True](https://huggingface.co/TheBloke/Athena - v4 - GPTQ/tree/gptq - 8bit - 32g - actorder_True)	8	32	Yes	0.1	[wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext - 2 - v1/test)	4096	14.54 GB	No	8 - bit, with group size 32g and Act Order for maximum inference quality.
[gptq - 4bit - 64g - actorder_True](https://huggingface.co/TheBloke/Athena - v4 - GPTQ/tree/gptq - 4bit - 64g - actorder_True)	4	64	Yes	0.1	[wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext - 2 - v1/test)	4096	7.51 GB	Yes	4 - bit, with Act Order and group size 64g. Uses less VRAM than 32g, but with slightly lower accuracy.

🔧 Technical Details

This section mainly involves the implementation details of GPTQ quantization, including the processing of quantization parameters, the selection of calibration datasets, etc. For specific information, please refer to the above description of GPTQ parameters.

📄 License

The creator of the source model has listed its license as cc - by - nc - 4.0, and this quantization has therefore used that same license.

As this model is based on Llama 2, it is also subject to the Meta Llama 2 license terms, and the license files for that are additionally included. It should therefore be considered as being claimed to be licensed under both licenses. I contacted Hugging Face for clarification on dual licensing but they do not yet have an official position. Should this change, or should Meta provide any feedback on this situation, I will update this section accordingly.

In the meantime, any questions regarding licensing, and in particular how these two licenses might interact, should be directed to the original model repository: [IkariDev + Undi95's Athena v4](https://huggingface.co/IkariDev/Athena - v4).

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご