Gemma-3-4B Persian v0 Open-Source Model - Freely Achieve Efficient Generation and Understanding of Persian Texts

Gemma 3 4b Persian V0

Developed by mshojaei77

A Persian-specific model based on the Gemma 3 architecture, utilizing QLoRA for 4-bit quantization, focused on Persian text generation and understanding

Large Language Model OtherOpen Source License:Apache-2.0 #Persian generation #4-bit quantization #Instruction following

Downloads 542

Release Time : 3/15/2025

Model Overview

This model is optimized for Persian text generation, retaining the base model's image input capabilities while reducing computational overhead through 4-bit quantization

Model Features

Persian optimization

Specifically fine-tuned for Persian, providing more accurate Persian text generation and comprehension

4-bit quantization

Utilizes QLoRA for 4-bit quantization, significantly reducing memory usage while maintaining performance

Multi-framework compatibility

Supports Hugging Face Transformers and Ollama frameworks for easy deployment in different environments

Retained image capabilities

Preserves the base model's image input processing capabilities while optimizing text performance

Model Capabilities

Persian text generation

Persian Q&A

Instruction execution

Conversational interaction

Image input processing

Use Cases

Q&A systems

Persian knowledge Q&A

Answering Persian questions about history, culture, and other fields

Provides accurate and fluent Persian responses

Virtual assistants

Persian chatbot

Integrated into chat applications to provide Persian conversation services

Natural Persian conversation experience

Content generation

Persian article writing

Generates Persian articles or stories based on prompts

Fluid and coherent Persian text output

🚀 Gemma 3-4B Persian (v0)

mshojaei77/gemma-3-4b-persian-v0 is a Persian-specialized model built on the Gemma 3 architecture. It uses QLoRA for 4-bit quantization to reduce computational overhead during Persian text generation and understanding. Besides text generation, it also retains the image input capabilities from its base model.

image/png

🚀 Quick Start

This model is compatible with both the Hugging Face Transformers library and Ollama.

💻 Usage Examples

Basic Usage

Running with Ollama

ollama run hf.co/mshojaei77/gemma-3-4b-persian-v0:Q8_0

Running with Hugging Face Transformers

Install Dependencies:

pip install git+https://github.com/huggingface/transformers@v4.49.0-Gemma-3 accelerate

Load Model and Tokenizer:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "mshojaei77/gemma-3-4b-persian-v0"

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",  # Use "cuda" for GPU usage if available
    torch_dtype=torch.bfloat16,  # Alternatively, use torch.float16
)
tokenizer = AutoTokenizer.from_pretrained(model_id)

messages = [
    {
        "role": "user",
        "content": "توماس جفرسون کیست؟"
    }
]
inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True, tokenize=True, return_tensors="pt"
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

📚 Documentation

📦 Training Data and Fine-Tuning

Training Dataset

This model was fine-tuned using the mshojaei77/Persian_sft dataset, which contains approximately 681,000 rows of Persian text focused on instruction-following and conversational interactions.

Fine-Tuning

Method: Supervised Fine-Tuning (SFT) using QLoRA (4-bit quantization)
Hardware: one T4 GPU
Software: Utilizes Hugging Face Transformers, with supporting libraries like peft for QLoRA and bitsandbytes for quantization
Trade-offs: Reduced memory footprint at the expense of some precision compared to full-precision models

🔧 Evaluation

[SOON]

📄 Usage Considerations and Limitations

Intended Use Cases

Question Answering: Responding accurately to Persian language queries
Instruction Following: Interpreting and executing text-based instructions in Persian
Text Generation: Producing fluent, context-aware Persian content
Conversational AI: Integrating into chatbots and virtual assistants
Image Processing: Retaining image input capabilities from the base model

Limitations

Quantization Impact: 4-bit quantization may reduce output precision and result in occasional incoherent responses.
Evaluation Scope: Absence of comprehensive evaluation metrics specific to this variant.
Bias: The model might mirror biases present in both the original Gemma 3 data and the Persian_sft dataset.
Hallucination: As with all LLMs, there is a risk of generating plausible-sounding but inaccurate information.
Safety: The model has not undergone safety tuning, so extra caution is advised when deploying in sensitive contexts.

🔧 Maintenance and Future Work

This model is under active maintenance. Future updates may include:

Additional evaluation metrics and benchmarks
Enhanced safety tuning and bias mitigation strategies
Expanded documentation and usage examples
Incorporation of community feedback for iterative improvements

For any queries, contributions, or issues, please contact me.

📄 License

This model is licensed under the apache-2.0 license.

📋 Metadata

Property	Details
Library Name	transformers
Tags	persian, text-generation, qlora, 4-bit-quantization
Base Model	google/gemma-3-4b-it
Datasets	mshojaei77/Persian_sft
Metrics	bleu

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご