đ Gemma 3 model card
Gemma 3 is a multimodal model from Google, capable of handling text and image input and generating text output. It has a large context window, multilingual support, and is available in various sizes, making it suitable for a wide range of text generation and image understanding tasks.
đ Quick Start
This repository corresponds to the 4B instruction-tuned version of the Gemma 3 model in GGUF format using Quantization Aware Training (QAT). The GGUF corresponds to Q4_0 quantization. Thanks to QAT, the model is able to preserve similar quality as bfloat16
while significantly reducing the memory requirements to load the model.
You can find the half-precision version here.
⨠Features
- Multimodal Capability: Handles both text and image input, generating text output.
- Large Context Window: Has a 128K context window, enabling better handling of long - form content.
- Multilingual Support: Supports over 140 languages.
- Multiple Sizes: Available in different sizes (1B, 4B, 12B, 27B) to suit various resource requirements.
đĻ Installation
There is no specific installation process described in the original README. However, to use the model, you need to have the necessary software environments such as llama.cpp
or ollama
installed.
đģ Usage Examples
Basic Usage
./llama-cli -hf google/gemma-3-4b-it-qat-q4_0-gguf -p "Write a poem about the Kraken."
ollama run hf.co/google/gemma-3-4b-it-qat-q4_0-gguf
Advanced Usage
wget https://github.com/bebechien/gemma/blob/main/surprise.png?raw=true -O ~/Downloads/surprise.png
./llama-gemma3-cli -hf google/gemma-3-4b-it-qat-q4_0-gguf -p "Describe this image." --image ~/Downloads/surprise.png
đ Documentation
Model Information
- Description: Gemma is a family of lightweight, state - of - the - art open models from Google, built from the same research and technology used to create the Gemini models. Gemma 3 models are multimodal, handling text and image input and generating text output, with open weights for both pre - trained variants and instruction - tuned variants.
- Inputs and outputs:
- Input:
- Text string, such as a question, a prompt, or a document to be summarized.
- Images, normalized to 896 x 896 resolution and encoded to 256 tokens each.
- Total input context of 128K tokens for the 4B, 12B, and 27B sizes, and 32K tokens for the 1B size.
- Output:
- Generated text in response to the input, such as an answer to a question, analysis of image content, or a summary of a document.
- Total output context of 8192 tokens.
Model Data
- Training Dataset: These models were trained on a dataset of text data that includes a wide variety of sources. The 27B model was trained with 14 trillion tokens, the 12B model was trained with 12 trillion tokens, 4B model was trained with 4 trillion tokens and 1B with 2 trillion tokens. Key components include web documents, code, mathematics, and images.
- Data Preprocessing:
- CSAM Filtering: Rigorous CSAM (Child Sexual Abuse Material) filtering was applied at multiple stages in the data preparation process.
- Sensitive Data Filtering: Automated techniques were used to filter out certain personal information and other sensitive data from training sets.
- Additional methods: Filtering based on content quality and safety in line with [our policies][safety - policies].
Implementation Information
- Hardware: Gemma was trained using [Tensor Processing Unit (TPU)][tpu] hardware (TPUv4p, TPUv5p and TPUv5e). TPUs offer advantages in performance, memory, scalability, cost - effectiveness, and are aligned with [Google's commitments to operate sustainably][sustainability].
- Software: Training was done using [JAX][jax] and [ML Pathways][ml - pathways]. JAX allows for faster and more efficient training on hardware like TPUs, and ML Pathways is suitable for building foundation models.
Evaluation
The evaluation in this section corresponds to the original checkpoint, not the QAT checkpoint.
- Benchmark Results:
- Reasoning and factuality: Evaluated on benchmarks like [HellaSwag][hellaswag], [BoolQ][boolq], etc.
- STEM and code: Tested on benchmarks such as [MMLU][mmlu], [AGIEval][agieval], etc.
- Multilingual: Benchmarked on [MGSM][mgsm], [Global - MMLU - Lite][global - mmlu - lite], etc.
- Multimodal: Evaluated using benchmarks like [COCOcap][coco - cap], [DocVQA][docvqa], etc.
Ethics and Safety
- Evaluation Approach: Includes structured evaluations and internal red - teaming testing of relevant content policies. Categories evaluated are child safety, content safety, and representational harms. Assurance evaluations are also conducted for responsibility governance decision - making.
- Evaluation Results: Major improvements were seen in child safety, content safety, and representational harms relative to previous Gemma models. All testing was conducted without safety filters. A limitation is that only English language prompts were included.
Usage and Limitations
- Intended Usage:
- Content Creation and Communication: Text generation, chatbots, text summarization, image data extraction.
- Research and Education: NLP and VLM research, language learning tools, knowledge exploration.
- Limitations:
- Training Data: Quality and diversity of training data can affect model capabilities.
- Context and Task Complexity: Models may struggle with open - ended or highly complex tasks.
- Language Ambiguity and Nuance: Difficulty in grasping subtle language nuances.
- Factual Accuracy: May generate incorrect or outdated factual statements.
- Common Sense: Lack of common sense reasoning in certain situations.
- Ethical Considerations and Risks: Concerns about bias and fairness, privacy and security, and misuse of the model.
đ§ Technical Details
- Model Architecture: Based on the same research and technology as the Gemini models, enabling multimodal processing.
- Quantization: Uses Quantization Aware Training (QAT) with Q4_0 quantization in the GGUF format, reducing memory requirements while maintaining quality.
đ License
The model is under the [Gemma][terms] license.
Citation
@article{gemma_2025,
title={Gemma 3},
url={https://goo.gle/Gemma3Report},
publisher={Kaggle},
author={Gemma Team},
year={2025}
}
Additional Resources
- Model Page: Gemma
- Resources and Technical Documentation:
- [Gemma 3 Technical Report][g3 - tech - report]
- [Responsible Generative AI Toolkit][rai - toolkit]
- [Gemma on Kaggle][kaggle - gemma]
- [Gemma on Vertex Model Garden][vertex - mg - gemma3]
- Terms of Use: [Terms][terms]
- Authors: Google DeepMind
[g3 - tech - report]: https://example.com/g3 - tech - report
[rai - toolkit]: https://example.com/rai - toolkit
[kaggle - gemma]: https://example.com/kaggle - gemma
[vertex - mg - gemma3]: https://example.com/vertex - mg - gemma3
[terms]: https://example.com/terms
[tpu]: https://example.com/tpu
[jax]: https://example.com/jax
[ml - pathways]: https://example.com/ml - pathways
[gemini - 2 - paper]: https://example.com/gemini - 2 - paper
[hellaswag]: https://arxiv.org/abs/1905.07830
[boolq]: https://arxiv.org/abs/1905.10044
[piqa]: https://arxiv.org/abs/1911.11641
[socialiqa]: https://arxiv.org/abs/1904.09728
[triviaqa]: https://arxiv.org/abs/1705.03551
[naturalq]: https://github.com/google - research - datasets/natural - questions
[arc]: https://arxiv.org/abs/1911.01547
[winogrande]: https://arxiv.org/abs/1907.10641
[bbh]: https://paperswithcode.com/dataset/bbh
[drop]: https://arxiv.org/abs/1903.00161
[mmlu]: https://arxiv.org/abs/2009.03300
[agieval]: https://arxiv.org/abs/2304.06364
[math]: https://arxiv.org/abs/2103.03874
[gsm8k]: https://arxiv.org/abs/2110.14168
[gpqa]: https://arxiv.org/abs/2311.12022
[mbpp]: https://arxiv.org/abs/2108.07732
[humaneval]: https://arxiv.org/abs/2107.03374
[mgsm]: https://arxiv.org/abs/2210.03057
[flores]: https://arxiv.org/abs/2106.03193
[xquad]: https://arxiv.org/abs/1910.11856v3
[global - mmlu - lite]: https://huggingface.co/datasets/CohereForAI/Global - MMLU - Lite
[wmt24pp]: https://arxiv.org/abs/2502.12404v1
[eclektic]: https://arxiv.org/abs/2502.21228
[indicgenbench]: https://arxiv.org/abs/2404.16816
[coco - cap]: https://cocodataset.org/#home
[docvqa]: https://www.docvqa.org/
[info - vqa]: https://arxiv.org/abs/2104.12756
[mmmu]: https://arxiv.org/abs/2311.16502
[textvqa]: https://textvqa.org/
[realworldqa]: https://paperswithcode.com/dataset/realworldqa
[remi]: https://arxiv.org/html/2406.09175v1
[ai2d]: https://allenai.org/data/diagrams
[chartqa]: https://arxiv.org/abs/2203.10244
[vqav2]: https://visualqa.org/index.html
[blinkvqa]: https://arxiv.org/abs/2404.12390
[okvqa]: https://okvqa.allenai.org/
[tallyqa]: https://arxiv.org/abs/1810.12440
[ss - vqa]: https://arxiv.org/abs/1908.02660
[countbenchqa]: https://github.com/google - research/big_vision/blob/main/big_vision/datasets/countbenchqa/
[safety - policies]: https://example.com/safety - policies
[sustainability]: https://example.com/sustainability