🚀 Gemma 3-4B Persian (v0)
mshojaei77/gemma-3-4b-persian-v0
is a Persian-specialized model built on the Gemma 3 architecture. It uses QLoRA for 4-bit quantization to reduce computational overhead during Persian text generation and understanding. Besides text generation, it also retains the image input capabilities from its base model.

🚀 Quick Start
This model is compatible with both the Hugging Face Transformers library and Ollama.
💻 Usage Examples
Basic Usage
Running with Ollama
ollama run hf.co/mshojaei77/gemma-3-4b-persian-v0:Q8_0
Running with Hugging Face Transformers
- Install Dependencies:
pip install git+https://github.com/huggingface/transformers@v4.49.0-Gemma-3 accelerate
- Load Model and Tokenizer:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "mshojaei77/gemma-3-4b-persian-v0"
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
messages = [
{
"role": "user",
"content": "توماس جفرسون کیست؟"
}
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True, tokenize=True, return_tensors="pt"
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
📚 Documentation
📦 Training Data and Fine-Tuning
Training Dataset
This model was fine-tuned using the mshojaei77/Persian_sft dataset, which contains approximately 681,000 rows of Persian text focused on instruction-following and conversational interactions.
Fine-Tuning
- Method: Supervised Fine-Tuning (SFT) using QLoRA (4-bit quantization)
- Hardware: one T4 GPU
- Software: Utilizes Hugging Face Transformers, with supporting libraries like
peft
for QLoRA and bitsandbytes
for quantization
- Trade-offs: Reduced memory footprint at the expense of some precision compared to full-precision models
🔧 Evaluation
[SOON]
📄 Usage Considerations and Limitations
Intended Use Cases
- Question Answering: Responding accurately to Persian language queries
- Instruction Following: Interpreting and executing text-based instructions in Persian
- Text Generation: Producing fluent, context-aware Persian content
- Conversational AI: Integrating into chatbots and virtual assistants
- Image Processing: Retaining image input capabilities from the base model
Limitations
- Quantization Impact: 4-bit quantization may reduce output precision and result in occasional incoherent responses.
- Evaluation Scope: Absence of comprehensive evaluation metrics specific to this variant.
- Bias: The model might mirror biases present in both the original Gemma 3 data and the Persian_sft dataset.
- Hallucination: As with all LLMs, there is a risk of generating plausible-sounding but inaccurate information.
- Safety: The model has not undergone safety tuning, so extra caution is advised when deploying in sensitive contexts.
🔧 Maintenance and Future Work
This model is under active maintenance. Future updates may include:
- Additional evaluation metrics and benchmarks
- Enhanced safety tuning and bias mitigation strategies
- Expanded documentation and usage examples
- Incorporation of community feedback for iterative improvements
For any queries, contributions, or issues, please contact me.
📄 License
This model is licensed under the apache-2.0
license.
📋 Metadata
Property |
Details |
Library Name |
transformers |
Tags |
persian, text-generation, qlora, 4-bit-quantization |
Base Model |
google/gemma-3-4b-it |
Datasets |
mshojaei77/Persian_sft |
Metrics |
bleu |