🚀 zhaav-gemma3-4B
The alifzl/zhaav-gemma3-4B_q8_0.gguf
model is a Persian-specific model, fine-tuned based on the Gemma 3 architecture. By leveraging QLoRA's 4-bit quantization, it reduces computational requirements while achieving strong performance in generating and understanding Persian text. Therefore, it is suitable for running on commodity hardware without GPUs.
🚀 Quick Start
This model is compatible with both the Hugging Face Transformers library and Ollama.
✨ Features
- Persian-specific fine-tuning based on Gemma 3 architecture.
- Utilizes QLoRA's 4-bit quantization to reduce computational demands.
- Suitable for running on commodity hardware without GPUs.
📦 Installation
Running with Ollama
ollama run hf.co/alifzl/zhaav-gemma3-4B:Q8_0
Running with Hugging Face Transformers
- Install Dependencies:
pip install git+https://github.com/huggingface/transformers@v4.49.0-Gemma-3 accelerate
💻 Usage Examples
Basic Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "alifzl/zhaav-gemma3-4B_q8_0.gguf"
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
messages = [
{
"role": "user",
"content": "تفاوت قهوه موکا با آمریکانو چیه؟"
}
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True, tokenize=True, return_tensors="pt"
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
📚 Documentation
Training Data and Fine-Tuning
Training Dataset
Fine-Tuning was conducted using the mshojaei77/Persian_sft dataset, which contains approximately 680k rows of Persian text focused on instruction-following and conversational interactions.
Fine-Tuning
- Method: Supervised Fine-Tuning (SFT) using QLoRA (4-bit quantization)
- Hardware: one T4 GPU
- Software: Utilizes Hugging Face Transformers, with supporting libraries like
peft
for QLoRA and bitsandbytes
for quantization
Evaluation Results
Property |
Details |
Avg. |
22.04 |
IFEval (0-Shot) |
43.58 |
BBH (3-Shot) |
31.87 |
MATH Lvl 5 (4-Shot) |
11.10 |
GPQA (0-shot) |
6.49 |
MuSR (0-shot) |
9.49 |
MMLU-PRO (5-shot) |
29.70 |
🔧 Technical Details
- Model Type: Fine-tuned Persian model based on Gemma 3 architecture with QLoRA 4-bit quantization.
- Training Data: mshojaei77/Persian_sft dataset with about 680k rows of Persian text for instruction-following and conversations.
- Fine-Tuning Method: Supervised Fine-Tuning (SFT) using QLoRA (4-bit quantization).
- Hardware: One T4 GPU.
- Software: Hugging Face Transformers, with
peft
for QLoRA and bitsandbytes
for quantization.
📄 License
The model is under the gemma license.
Future Work
- Add additional evaluation metrics and benchmarks.
- Expand documentation and usage examples.