đ Gemma 3 4B Instruction-tuned QAT AutoAWQ
Gemma 3 4B Instruction-tuned QAT AutoAWQ is a checkpoint converted from google/gemma-3-4b-it-qat-q4_0-gguf to AutoAWQ format and BF16 dtype. The vision tower was transplanted from google/gemma-3-4b-it.
Below is the original model card.
đ Quick Start
Model Information
- Base Model: google/gemma-3-4b-it
- License: gemma
- Tags: gemma3, gemma, google
- Pipeline Tag: image-text-to-text
Usage Examples
Basic Usage
llama.cpp (text-only)
./llama-cli -hf google/gemma-3-4b-it-qat-q4_0-gguf -p "Write a poem about the Kraken."
llama.cpp (image input)
wget https://github.com/bebechien/gemma/blob/main/surprise.png?raw=true -O ~/Downloads/surprise.png
./llama-gemma3-cli -hf google/gemma-3-4b-it-qat-q4_0-gguf -p "Describe this image." --image ~/Downloads/surprise.png
ollama (text only)
ollama run hf.co/google/gemma-3-4b-it-qat-q4_0-gguf
⨠Features
- Multimodal Capability: Handles text and image input, generating text output.
- Large Context Window: Supports a 128K context window (32K for the 1B size), enabling more comprehensive input.
- Multilingual Support: Offers support for over 140 languages.
- Open Weights: Both pre-trained and instruction-tuned variants have open weights.
đĻ Installation
No specific installation steps are provided in the original document.
đ Documentation
Model Information
Description
Gemma is a family of lightweight, state-of-the-art open models from Google, built on the same research and technology as the Gemini models. Gemma 3 models are multimodal, capable of handling text and image input and generating text output. They have open weights for both pre-trained and instruction-tuned variants. With a large 128K context window, multilingual support in over 140 languages, and more size options than previous versions, Gemma 3 models are suitable for various text generation and image understanding tasks. Their relatively small size allows deployment in resource-limited environments, democratizing access to advanced AI models.
Inputs and outputs
Input |
Output |
- Text string (e.g., question, prompt, document to summarize) - Images (normalized to 896 x 896 resolution and encoded to 256 tokens each) - Total input context of 128K tokens for 4B, 12B, and 27B sizes; 32K tokens for 1B size |
- Generated text (e.g., answer to a question, analysis of image content, summary of a document) - Total output context of 8192 tokens |
Model Data
Training Dataset
These models were trained on a diverse text dataset, including web documents, code, mathematics, and images. The 27B model was trained with 14 trillion tokens, the 12B model with 12 trillion tokens, the 4B model with 4 trillion tokens, and the 1B model with 2 trillion tokens.
Data Preprocessing
- CSAM Filtering: Rigorous filtering to exclude child sexual abuse material.
- Sensitive Data Filtering: Automated techniques to filter out personal and sensitive data.
- Additional Methods: Filtering based on content quality and safety according to our policies.
Implementation Information
Hardware
Gemma was trained using Tensor Processing Unit (TPU) hardware (TPUv4p, TPUv5p, and TPUv5e). TPUs offer advantages in performance, memory, scalability, cost-effectiveness, and align with Google's sustainability commitments.
Software
Training was done using JAX and ML Pathways. JAX enables efficient use of hardware, while ML Pathways is suitable for building foundation models.
Evaluation
â ī¸ Important Note
The evaluation in this section corresponds to the original checkpoint, not the QAT checkpoint.
Benchmark Results
The models were evaluated on various datasets and metrics for different aspects of text generation, including reasoning, STEM and code, multilingual, and multimodal tasks.
Benchmark |
Metric |
Gemma 3 PT 1B |
Gemma 3 PT 4B |
Gemma 3 PT 12B |
Gemma 3 PT 27B |
HellaSwag |
10-shot |
62.3 |
77.2 |
84.2 |
85.6 |
BoolQ |
0-shot |
63.2 |
72.3 |
78.8 |
82.4 |
... |
... |
... |
... |
... |
... |
Ethics and Safety
Evaluation Approach
The evaluation methods include structured evaluations and internal red-teaming testing. The models were evaluated in categories such as child safety, content safety, and representational harms. Assurance evaluations are also conducted for responsibility governance decision making.
Evaluation Results
Significant improvements were observed in child safety, content safety, and representational harms compared to previous Gemma models. All testing was done without safety filters. However, the evaluations only included English language prompts.
Usage and Limitations
Intended Usage
- Content Creation and Communication: Text generation, chatbots, text summarization, and image data extraction.
- Research and Education: NLP and VLM research, language learning tools, and knowledge exploration.
Limitations
- Training Data: Quality and diversity of training data can affect model capabilities.
- Context and Task Complexity: Models perform better with clear prompts and instructions.
- Language Ambiguity and Nuance: Struggle with subtle language nuances.
- Factual Accuracy: May generate incorrect or outdated factual statements.
- Common Sense: Lack the ability to apply common sense reasoning.
đ§ Technical Details
Model Conversion
This checkpoint was converted from google/gemma-3-4b-it-qat-q4_0-gguf to AutoAWQ format and BF16 dtype.
Vision Tower Transplant
The vision tower was transplanted from google/gemma-3-4b-it.
đ License
The model is licensed under the [gemma] license.
Citation
@article{gemma_2025,
title={Gemma 3},
url={https://goo.gle/Gemma3Report},
publisher={Kaggle},
author={Gemma Team},
year={2025}
}