🚀 Gemma 3 12B Instruction-tuned QAT AutoAWQ
This project focuses on converting the model checkpoint from the GGUF format to the AutoAWQ format and BF16 data type, enabling users to efficiently utilize the model's capabilities. The vision tower of the model is sourced from the official Google repository, ensuring high - quality visual processing.
🚀 Quick Start
This checkpoint was converted from https://huggingface.co/google/gemma-3-12b-it-qat-q4_0-gguf to AutoAWQ format and BF16 dtype (hence, not lossess). The vision tower was transplanted from https://huggingface.co/google/gemma-3-12b-it.
Below is the original model card.
✨ Features
- Multimodal Capability: Handles both text and image input, generating text output.
- Open - Source Weights: Both pre - trained and instruction - tuned variants have open weights.
- Large Context Window: Supports a 128K context window, enabling the handling of long - form content.
- Multilingual Support: Capable of processing over 140 languages.
- Resource - Friendly: Can be deployed on laptops, desktops, or cloud infrastructure with limited resources.
📦 Installation
To access Gemma on Hugging Face, you’re required to review and agree to Google’s usage license. To do this, please ensure you’re logged in to Hugging Face and click the "Acknowledge license" button below. Requests are processed immediately.
💻 Usage Examples
Basic Usage
llama.cpp (text - only)
./llama-cli -hf google/gemma-3-12b-it-qat-q4_0-gguf -p "Write a poem about the Kraken."
llama.cpp (image input)
wget https://github.com/bebechien/gemma/blob/main/surprise.png?raw=true -O ~/Downloads/surprise.png
./llama-gemma3-cli -hf google/gemma-3-12b-it-qat-q4_0-gguf -p "Describe this image." --image ~/Downloads/surprise.png
ollama (text only)
Using GGUFs with Ollama via Hugging Face does not support image inputs at the moment. Please check the docs on running gated repositories.
ollama run hf.co/google/gemma-3-12b-it-qat-q4_0-gguf
📚 Documentation
Model Information
- Description: Gemma is a family of lightweight, state - of - the - art open models from Google, built on the same technology as Gemini. Gemma 3 models are multimodal, handling text and image input and generating text output.
- Inputs and Outputs:
- Input: Text string or images (normalized to 896 x 896 resolution and encoded to 256 tokens each), with a total input context of 128K tokens for 4B, 12B, and 27B sizes, and 32K tokens for 1B size.
- Output: Generated text, with a total output context of 8192 tokens.
Model Data
- Training Dataset: Trained on a diverse dataset including web documents, code, mathematics, and images. Different model sizes were trained with different amounts of tokens.
- Data Preprocessing: Applied CSAM filtering, sensitive data filtering, and other content - quality and safety - based filtering methods.
Implementation Information
- Hardware: Trained using Tensor Processing Unit (TPU) hardware (TPUv4p, TPUv5p, and TPUv5e), offering performance, memory, scalability, cost - effectiveness, and alignment with Google's sustainability commitments.
- Software: Trained using JAX and ML Pathways, which simplify the development workflow.
Evaluation
- Benchmark Results: Evaluated against various datasets and metrics in different aspects such as reasoning, STEM and code, multilingual, and multimodal tasks.
- Reasoning and factuality: Tested on benchmarks like HellaSwag, BoolQ, etc.
- STEM and code: Benchmarked on [MMLU][mmlu], [AGIEval][agieval], etc.
- Multilingual: Evaluated using [MGSM][mgsm], [Global - MMLU - Lite][global - mmlu - lite], etc.
- Multimodal: Tested on [COCOcap][coco - cap], [DocVQA][docvqa], etc.
Ethics and Safety
- Evaluation Approach: Conducted structured evaluations and internal red - teaming testing, covering child safety, content safety, and representational harms. Also includes "assurance evaluations" for release decision - making.
- Evaluation Results: Showed major improvements in safety categories compared to previous Gemma models, with minimal policy violations. However, evaluations were limited to English - language prompts.
Usage and Limitations
- Intended Usage: Can be used for content creation, chatbots, text summarization, image data extraction, research, and education.
- Limitations: Affected by training data quality and diversity, context and task complexity, language ambiguity, and factual accuracy.
🔧 Technical Details
- Model Page: Gemma
- Resources and Technical Documentation:
- Terms of Use: Terms
- Authors: Google DeepMind
📄 License
The license for this model is Gemma.
Citation
@article{gemma_2025,
title={Gemma 3},
url={https://goo.gle/Gemma3Report},
publisher={Kaggle},
author={Gemma Team},
year={2025}
}
[naturalq]: https://github.com/google - research - datasets/natural - questions
[arc]: https://arxiv.org/abs/1911.01547
[winogrande]: https://arxiv.org/abs/1907.10641
[bbh]: https://paperswithcode.com/dataset/bbh
[drop]: https://arxiv.org/abs/1903.00161
[mmlu]: https://arxiv.org/abs/2009.03300
[agieval]: https://arxiv.org/abs/2304.06364
[math]: https://arxiv.org/abs/2103.03874
[gsm8k]: https://arxiv.org/abs/2110.14168
[gpqa]: https://arxiv.org/abs/2311.12022
[mbpp]: https://arxiv.org/abs/2108.07732
[humaneval]: https://arxiv.org/abs/2107.03374
[mgsm]: https://arxiv.org/abs/2210.03057
[flores]: https://arxiv.org/abs/2106.03193
[xquad]: https://arxiv.org/abs/1910.11856v3
[global - mmlu - lite]: https://huggingface.co/datasets/CohereForAI/Global - MMLU - Lite
[wmt24pp]: https://arxiv.org/abs/2502.12404v1
[eclektic]: https://arxiv.org/abs/2502.21228
[indicgenbench]: https://arxiv.org/abs/2404.16816
[coco - cap]: https://cocodataset.org/#home
[docvqa]: https://www.docvqa.org/
[info - vqa]: https://arxiv.org/abs/2104.12756
[mmmu]: https://arxiv.org/abs/2311.16502
[textvqa]: https://textvqa.org/
[realworldqa]: https://paperswithcode.com/dataset/realworldqa
[remi]: https://arxiv.org/html/2406.09175v1
[ai2d]: https://allenai.org/data/diagrams
[chartqa]: https://arxiv.org/abs/2203.10244
[vqav2]: https://visualqa.org/index.html
[blinkvqa]: https://arxiv.org/abs/2404.12390
[okvqa]: https://okvqa.allenai.org/
[tallyqa]: https://arxiv.org/abs/1810.12440
[ss - vqa]: https://arxiv.org/abs/1908.02660
[countbenchqa]: https://github.com/google - research/big_vision/blob/main/big_vision/datasets/countbenchqa/
[safety - policies]: #
[sustainability]: #