AceGPT-13B-chat-AWQ Open-source Chat Model - Free Support for English and Arabic, Efficient Inference on Common GPUs

Acegpt 13B Chat AWQ

Developed by MohamedRashad

The AWQ quantized version of AceGPT 13B Chat, supporting English and Arabic, designed for general GPU users, offering efficient 4-bit quantized inference capabilities.

Large Language Model

Transformers

Supports Multiple Languages#Arabic Large Language Model #4-bit Quantized Inference #Multilingual Dialogue

Downloads 37

Release Time : 11/16/2023

Model Overview

AceGPT 13B Chat is a large language model based on the Llama2 architecture, processed with AWQ quantization, supporting English and Arabic, suitable for text generation and dialogue tasks.

Model Features

Efficient Quantization

Utilizes AWQ quantization method, supporting 4-bit quantization, providing faster inference speeds while maintaining high quality.

Multilingual Support

Supports English and Arabic, with special optimization for Arabic.

Low Resource Requirements

The quantized model is suitable for general GPU users, reducing hardware demands.

Model Capabilities

Text Generation

Multilingual Dialogue

Arabic Text Processing

Use Cases

Language Processing

Arabic Poetry Generation

Generate Arabic poetry or answer questions about Arabic culture.

Capable of generating poetry and responses that align with Arabic cultural contexts.

Multilingual Customer Service

Used for customer service dialogue systems supporting English and Arabic.

Provides a smooth multilingual dialogue experience.

🚀 AceGPT 13B Chat - AWQ

This repository provides AWQ model files for FreedomIntelligence's AceGPT 13B Chat, aiming to make Arabic LLMs accessible to users with simple GPUs.

🚀 Quick Start

Prerequisites

Transformers 4.35.0 or later.
AutoAWQ 0.1.6 or later.

Installation

pip3 install --upgrade "autoawq>=0.1.6" "transformers>=4.35.0"

⚠️ Important Note

If you are using PyTorch 2.0.1, the above AutoAWQ command will automatically upgrade you to PyTorch 2.1.0. If you are using CUDA 11.8 and wish to continue using PyTorch 2.0.1, run this command instead:

pip3 install https://github.com/casper-hansen/AutoAWQ/releases/download/v0.1.6/autoawq-0.1.6+cu118-cp310-cp310-linux_x86_64.whl

💡 Usage Tip

If you have problems installing AutoAWQ using the pre-built wheels, install it from source instead:

pip3 uninstall -y autoawq
git clone https://github.com/casper-hansen/AutoAWQ
cd AutoAWQ
pip3 install .

✨ Features

Model Creator: FreedomIntelligence
Original Model: AceGPT 13B Chat
Quantized by: MohamedRashad
Supported Languages: English (en), Arabic (ar)
Library Name: transformers

About AWQ

AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference with equivalent or better quality compared to the most commonly used GPTQ settings. It is supported by multiple platforms:

Text Generation Webui - using Loader: AutoAWQ
vLLM - Llama and Mistral models only
Hugging Face Text Generation Inference (TGI)
Transformers version 4.35.0 and later, from any code or client that supports Transformers
AutoAWQ - for use from Python code

📦 Installation

Install the necessary packages

pip3 install --upgrade "autoawq>=0.1.6" "transformers>=4.35.0"

💻 Usage Examples

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

model_name_or_path = "MohamedRashad/AceGPT-13B-chat-AWQ"

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, padding_side="right")
model = AutoModelForCausalLM.from_pretrained(
    model_name_or_path,
    use_flash_attention_2=True, # disable if you have problems with flash attention 2
    torch_dtype=torch.float16,
    low_cpu_mem_usage=True,
    device_map="auto"
)

# Using the text streamer to stream output one token at a time
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

prompt = "ما أجمل بيت شعر فى اللغة العربية ؟"
prompt_template=f'''[INST] <<SYS>>\nأنت مساعد مفيد ومحترم وصادق. أجب دائما بأكبر قدر ممكن من المساعدة بينما تكون آمنا.  يجب ألا تتضمن إجاباتك أي محتوى ضار أو غير أخلاقي أو عنصري أو جنسي أو سام أو خطير أو غير قانوني. يرجى التأكد من أن ردودك غير متحيزة اجتماعيا وإيجابية بطبيعتها.\n\nإذا كان السؤال لا معنى له أو لم يكن متماسكا من الناحية الواقعية، اشرح السبب بدلا من الإجابة على شيء غير صحيح. إذا كنت لا تعرف إجابة سؤال ما، فيرجى عدم مشاركة معلومات خاطئة.\n<</SYS>>\n\n
[INST] {prompt} [/INST]
'''

# Convert prompt to tokens
tokens = tokenizer(
    prompt_template,
    return_tensors='pt'
).input_ids.cuda()

generation_params = {
    "do_sample": True,
    "temperature": 0.7,
    "top_p": 0.95,
    "top_k": 40,
    "max_new_tokens": 512,
    "repetition_penalty": 1.1
}

# Generate streamed output, visible one token at a time
generation_output = model.generate(
    tokens,
    streamer=streamer,
    **generation_params
)

# Generation without a streamer, which will include the prompt in the output
generation_output = model.generate(
    tokens,
    **generation_params
)

# Get the tokens from the output, decode them, print them
token_output = generation_output[0]
text_output = tokenizer.decode(token_output)
print("model.generate output: ", text_output)

# Inference is also possible via Transformers' pipeline
from transformers import pipeline

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    **generation_params
)

pipe_output = pipe(prompt_template)[0]['generated_text']
print("pipeline output: ", pipe_output)

Advanced Usage

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer, AutoModelForCausalLM

model_path = "FreedomIntelligence/AceGPT-13B-chat"
quant_path = "AceGPT-13B-chat-AWQ"
quant_config = {"zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM"}
load_config = {
    "low_cpu_mem_usage": True,
    "device_map": "auto",
    "trust_remote_code": True,
}
# Load model
model = AutoAWQForCausalLM.from_pretrained(model_path, **load_config)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

# Quantize
model.quantize(tokenizer, quant_config=quant_config)

# Save quantized model
model.save_quantized(quant_path)
tokenizer.save_pretrained(quant_path)

# Load quantized model
model = AutoModelForCausalLM.from_pretrained(quant_path)
tokenizer = AutoTokenizer.from_pretrained(quant_path)

# Push to hub
model.push_to_hub(quant_path)
tokenizer.push_to_hub(quant_path)

📚 Documentation

Prompt template: Unknown

[INST] <<SYS>>\nأنت مساعد مفيد ومحترم وصادق. أجب دائما بأكبر قدر ممكن من المساعدة بينما تكون آمنا.  يجب ألا تتضمن إجاباتك أي محتوى ضار أو غير أخلاقي أو عنصري أو جنسي أو سام أو خطير أو غير قانوني. يرجى التأكد من أن ردودك غير متحيزة اجتماعيا وإيجابية بطبيعتها.\n\nإذا كان السؤال لا معنى له أو لم يكن متماسكا من الناحية الواقعية، اشرح السبب بدلا من الإجابة على شيء غير صحيح. إذا كنت لا تعرف إجابة سؤال ما، فيرجى عدم مشاركة معلومات خاطئة.\n<</SYS>>\n\n
[INST] {prompt} [/INST]

🔧 Technical Details

Model Information

Property	Details
Base Model	FreedomIntelligence/AceGPT-13B-chat
Model Creator	FreedomIntelligence
Model Name	AceGPT 13B chat
Model Type	llama2
Quantized By	MohamedRashad
Training Datasets	FreedomIntelligence/Arabic-Vicuna-80, FreedomIntelligence/Arabic-AlpacaEval, FreedomIntelligence/MMLU_Arabic, FreedomIntelligence/EXAMs, FreedomIntelligence/ACVA-Arabic-Cultural-Value-Alignment
Supported Languages	en, ar
Library Name	transformers

License

The model is licensed under the llama2 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご