Dorna-Llama3-8B-Instruct-Quantized4Bit Open-Source Model - Optimize Persian Language Processing, Essential for Efficient Inference

Dorna Llama3 8B Instruct Quantized4Bit

Developed by amirMohammadi

4-bit quantized version of Dorna-Llama3-8B-Instruct, optimized for Persian language with Flash Attention 2 technology for enhanced inference efficiency

Large Language Model

Transformers

Supports Multiple Languages#Persian optimization #4-bit quantization #Low VRAM inference

Downloads 22

Release Time : 6/8/2024

Model Overview

This is an 8B-parameter large language model based on the Llama3 architecture, specifically fine-tuned for Persian language data and optimized for memory usage through 4-bit quantization, suitable for Persian text generation tasks

Model Features

Memory optimization

4-bit quantization significantly reduces memory requirements, suitable for resource-constrained environments

Inference acceleration

Integrated Flash Attention 2 technology improves processing speed

Persian optimization

Specifically trained/fine-tuned for Persian language data

Easy deployment

Ready to use out-of-the-box without additional libraries like LlamaCPP or Candle

Model Capabilities

Persian text generation

English text generation

Dialogue systems

Question answering systems

Text summarization

Use Cases

Language services

Persian intelligent assistant

For building Persian dialogue systems

Achieved 55.77% win rate against Persian Mind model in human evaluations

Cross-language QA system

Supports question answering services in Persian and English

Excellent performance in news QA tasks

Education

Language learning tool

Assists Persian language learners in practice

🚀 Dorna-Llama3-8B-Instruct-Quantized4Bit

This project provides a 4-bit quantized version of the Dorna-Llama3-8B-Instruct model, aiming to optimize memory usage. The Dorna model, a decoder-only architecture, is specifically fine - tuned on Persian data. It also integrates Flash Attention 2 for accelerated inference.

🚀 Quick Start

You can run conversational inference using the Transformers Auto classes with the generate() function. Here is an example:

import torch
import transformers
from transformers import AutoTokenizer, AutoModelForCausalLM

model_path = "amirMohammadi/Dorna-Llama3-8B-Instruct-Quantized4Bit"

tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "system",
     "content": "You are a helpful Persian assistant. Please answer questions in the asked language."},
    {"role": "user", "content": "اصفهان بزرگ تر است یا قم؟"},
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = model.generate(
    input_ids,
    max_new_tokens=256,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))

✨ Features

Reduced Memory Usage: 4-bit quantization lowers memory requirements.
Faster Inference: Flash Attention 2 speeds up processing.
Easy Deployment: No need for additional libraries like LlamaCPP or Candle.
Ready to Use: Compatible with Langchain, Haystack, LlamaIndex 2, and more.
Google Colab Friendly: Can run on Google Colab free tier with T4 GPU (less than 15 GB of GPU RAM).

📚 Documentation

Evaluation of Non - Quantized version

This model is evaluated on questions across various tasks, including Boolean Questions, Code Generation, Long Response, Math, News QA, Paraphrasing, General Knowledge, and Summarization. Most categories typically have two main difficulty levels: Hard and Easy.

Both human evaluation and automatic evaluation (with GPT - 4 as the judge) are performed. In both tables, Dorna - 8B - it is used as an abbreviated form of Dorna - Llama3 - 8B - Instruct.

Overall human evaluation results

Model Pairs	Parameters	Win %	Lose %	Tie %
Dorna - 8B - it vs. Meta - Llama - 3 - 8B - Instruct	8B	36.94	17.39	45.67
Dorna - 8B - it vs. GPT 3.5 turbo - 1106	N.A.	32.01	26.94	41.05
Dorna - 8B - it vs. Persian Mind	7B	55.77	10.49	33.74

Category - based human evaluation results

Win/Lose/Tie % is reported for each category.

Model Pairs	Parameters	Bool Complex	Bool Easy	Code Gen	General Long Response	Historical Long Response	Math Complex	Math Easy	News QA Complex	News QA Easy	Paraphrasing	General Knowledge Easy	General Knowledge Hard	Summarization
Dorna - 8B - it vs. Meta - Llama - 3 - 8B - Instruct	8B	0.25/0.25/0.5	0.28/0.35/0.38	0.6/0.1/0.3	0.8/0.08/0.12	0.4/0.3/0.3	0.28/0.08/0.65	0.47/0.00/0.53	0.55/0.07/0.38	0.43/0.15/0.42	0.1/0.05/0.85	0.31/0.2/0.49	0.59/0.13/0.28	0.28/0.2/0.53
Dorna - 8B - it vs. GPT 3.5 turbo - 1106	N.A.	0.35/0.35/0.3	0.3/0.3/0.4	0.1/0.3/.06	0.2/0.45/0.35	0.46/0.27/0.27	0.25/0.1/0.65	0.05/0.1/0.85	0.12/0.35/0.53	0.15/0.1/0.75	0.25/0.15/0.6	0.3/0.32/0.38	0.22/0.53/0.25	0.35/0.55/0.1
Dorna - 8B - it vs. Persian Mind	7B	0.47/0.25/0.28	0.57/0.15/0.28	0.9/0.1/0.0	0.82/0.08/0.1	0.4/0.17/0.42	0.3/0.0/0.7	0.22/0.08/0.7	0.72/0.07/0.2	0.7/0.0/0.3	0.7/0.05/0.25	0.51/0.12/0.37	0.61/0.1/0.29	0.93/0.0/0.07

Automatic evaluation results

Model Pairs	Parameters	Overall Win Rate %	Easy Win Rate %	Hard Win Rate %
Dorna - 8B - it vs. Llama 3 base	8B	58.96	56.00	64.49
Dorna - 8B - it vs. Part Mistral	7B	77.20	73.00	85.05
Dorna - 8B - it vs. Persian Mind	7B	90.88	87.50	97.20
Dorna - 8B - it vs. Neuraorca Gemma 7b	7B	86.32	86.50	85.98
Dorna - 8B - it vs. Maral 7b	7B	97.39	97.00	98.13
Dorna - 8B - it vs. PersianLlama 7b	7B	98.70	98.00	100.00
Dorna - 8B - it vs. Aya - 23 - 8B	8B	52.77	56.50	45.79
Dorna - 8B - it vs. Aya - 23 - 35B	35B	45.93	54.00	30.84
Dorna - 8B - it vs. Command R	35B	58.63	61.00	54.21

📄 License

This model is under the llama3 license.

📞 Contact us

If you have any questions regarding this model, you can reach us via the community on Hugging Face.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご