Zhaav-Gemma3-4B Open Source Persian Model - Compatible with Consumer Hardware, Free and Easy to Deploy and Use

Zhaav Gemma3 4B

Developed by alifzl

A Persian-specific model fine-tuned based on the Gemma 3 architecture, utilizing QLoRA 4-bit quantization technology, suitable for running on ordinary hardware.

Large Language Model Other#Persian generation #4-bit quantization #Low-resource deployment

Downloads 40

Release Time : 4/1/2025

Model Overview

This model reduces computational demands through QLoRA's 4-bit quantization technology, efficiently generating and understanding Persian text, suitable for running on hardware without GPUs.

Model Features

4-bit quantization technology

Utilizes QLoRA's 4-bit quantization technology, significantly reducing computational demands, suitable for running on ordinary hardware.

Persian optimization

Specially fine-tuned for Persian, capable of efficiently generating and understanding Persian text.

Low hardware requirements

Suitable for running on ordinary hardware without GPUs, lowering the barrier to use.

Model Capabilities

Persian text generation

Persian text understanding

Instruction following

Conversational interaction

Use Cases

Natural language processing

Persian Q&A

Answer various questions about Persian culture, language, and daily life.

Capable of generating fluent and accurate Persian responses.

Persian dialogue system

Build Persian chatbots for customer service or daily communication.

Capable of understanding and generating natural Persian dialogues.

🚀 zhaav-gemma3-4B

The alifzl/zhaav-gemma3-4B_q8_0.gguf model is a Persian-specific model, fine-tuned based on the Gemma 3 architecture. By leveraging QLoRA's 4-bit quantization, it reduces computational requirements while achieving strong performance in generating and understanding Persian text. Therefore, it is suitable for running on commodity hardware without GPUs.

🚀 Quick Start

This model is compatible with both the Hugging Face Transformers library and Ollama.

✨ Features

Persian-specific fine-tuning based on Gemma 3 architecture.
Utilizes QLoRA's 4-bit quantization to reduce computational demands.
Suitable for running on commodity hardware without GPUs.

📦 Installation

Running with Ollama

ollama run hf.co/alifzl/zhaav-gemma3-4B:Q8_0

Running with Hugging Face Transformers

Install Dependencies:

pip install git+https://github.com/huggingface/transformers@v4.49.0-Gemma-3 accelerate

💻 Usage Examples

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "alifzl/zhaav-gemma3-4B_q8_0.gguf"

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",  # Use "cuda" for GPU usage if available
    torch_dtype=torch.bfloat16,  # Alternatively, use torch.float16
)
tokenizer = AutoTokenizer.from_pretrained(model_id)

messages = [
    {
        "role": "user",
        "content": "تفاوت قهوه موکا با آمریکانو چیه؟"
    }
]
inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True, tokenize=True, return_tensors="pt"
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

📚 Documentation

Training Data and Fine-Tuning

Training Dataset

Fine-Tuning was conducted using the mshojaei77/Persian_sft dataset, which contains approximately 680k rows of Persian text focused on instruction-following and conversational interactions.

Fine-Tuning

Method: Supervised Fine-Tuning (SFT) using QLoRA (4-bit quantization)
Hardware: one T4 GPU
Software: Utilizes Hugging Face Transformers, with supporting libraries like peft for QLoRA and bitsandbytes for quantization

Evaluation Results

Property	Details
Avg.	22.04
IFEval (0-Shot)	43.58
BBH (3-Shot)	31.87
MATH Lvl 5 (4-Shot)	11.10
GPQA (0-shot)	6.49
MuSR (0-shot)	9.49
MMLU-PRO (5-shot)	29.70

🔧 Technical Details

Model Type: Fine-tuned Persian model based on Gemma 3 architecture with QLoRA 4-bit quantization.
Training Data: mshojaei77/Persian_sft dataset with about 680k rows of Persian text for instruction-following and conversations.
Fine-Tuning Method: Supervised Fine-Tuning (SFT) using QLoRA (4-bit quantization).
Hardware: One T4 GPU.
Software: Hugging Face Transformers, with peft for QLoRA and bitsandbytes for quantization.

📄 License

The model is under the gemma license.

Future Work

Add additional evaluation metrics and benchmarks.
Expand documentation and usage examples.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご