🚀 Gemma-3 12B Instruct GGUF Models
Gemma-3 12B Instruct GGUF Models are experimental requantized models. The goal is to test if the QAT model requantized performs better than the bf16 model quantized to the same bit level.
🚀 Quick Start
Model Comparison Test
The author has conducted tests on the 4b model, including one quantized from bf16 and another requantized from the QAT Q4_0 model. Both were quantized with the same tensor quants.
python3 ~/code/GGUFModelBuilder/perp_test_2_files.py ./gemma-3-4b-it-qat-q4_0-q3_k_l.gguf ./google_gemma-3-4b-it-q3_k_l.gguf
Testing model: gemma-3-4b-it-qat-q4_0-q3_k_l.gguf
Running: llama.cpp/llama-perplexity -m gemma-3-4b-it-qat-q4_0-q3_k_l.gguf -f perplexity_test_data.txt --ctx-size 256 --ppl-stride 32 --chunks 1 --threads 4
[✓] Perplexity: 4.0963 (Time: 284.70s)
Testing model: google_gemma-3-4b-it-q3_k_l.gguf
Running: llama.cpp/llama-perplexity -m google_gemma-3-4b-it-q3_k_l.gguf -f perplexity_test_data.txt --ctx-size 256 --ppl-stride 32 --chunks 1 --threads 4
[✓] Perplexity: 4.5557 (Time: 287.15s)
=== Comparison Results ===
Model 1: gemma-3-4b-it-qat-q4_0-q3_k_l.gguf - Perplexity: 4.10 (Time: 284.70s)
Model 2: google_gemma-3-4b-it-q3_k_l.gguf - Perplexity: 4.56 (Time: 287.15s)
Winner: gemma-3-4b-it-qat-q4_0-q3_k_l.gguf (Difference: 0.46)
Code Functionality Comparison
The author also compared the code functionality of the two models when asked to write some .net code to test if a website is using quantum safe encryption.
Technical Accuracy
- QAT q4_0 Model: Checks both TLS version and cipher suites, which are critical for assessing quantum resistance. It also explicitly acknowledges limitations.
- BF16 Model: Relies on checking for a non - standard TLS/1.3 header, which does not exist in HTTP responses. It contains incorrect logic.
Code Quality
- QAT q4_0 Model: Uses modern async/await patterns for non - blocking I/O, separates concerns into methods, and includes robust error handling and logging.
- BF16 Model: Uses blocking synchronous code, which violates .NET best practices and risks deadlocks. It has a poor structure.
Security Relevance
- QAT q4_0 Model: Focuses on cipher suites and mentions the need to update cipher lists based on NIST guidelines.
- BF16 Model: Misleadingly claims to check for "AES - 256 - CBC" but never implements it and fails to address cipher suites.
Realism
- QAT q4_0 Model: Acknowledges the complexity of quantum - safe detection and clarifies that HTTP - based checks are insufficient.
- BF16 Model: Implies that checking for TLS 1.3 guarantees quantum safety, which is false.
Usability
- QAT q4_0 Model: Provides clear console output and includes a working Main method with an example URL.
- BF16 Model: Fails to compile due to syntax errors and lacks meaningful output.
Critical Flaws in Both Models
- Header Misuse: Both models incorrectly assume TLS version and cipher suites are exposed in HTTP headers.
- Quantum - Safe Misunderstanding: Neither code checks for post - quantum algorithms, so both models provide false positives.
Final Verdict
The QAT q4_0 model's code is superior, but both models fail to solve the original problem due to fundamental misunderstandings of TLS/SSL mechanics. Further investigation is needed.
✨ Features
- Multimodal Capability: Gemma 3 models can handle text and image input and generate text output.
- Large Context Window: It has a large, 128K context window (32K for the 1B size), enabling it to handle long - form input.
- Multilingual Support: Supports over 140 languages.
- Open - Weighted: Both pre - trained and instruction - tuned variants have open weights.
- Suitable for Limited Resources: Its relatively small size allows deployment in environments with limited resources.
📦 Installation
There is no specific installation content provided in the original document, so this section is skipped.
💻 Usage Examples
Basic Usage
llama.cpp (text - only)
./llama-cli -hf google/gemma-3-27b-it-qat-q4_0-gguf -p "Write a poem about the Kraken."
llama.cpp (image input)
wget https://github.com/bebechien/gemma/blob/main/surprise.png?raw=true -O ~/Downloads/surprise.png
./llama-gemma3-cli -hf google/gemma-3-12b-it-qat-q4_0-gguf -p "Describe this image." --image ~/Downloads/surprise.png
ollama (text only)
ollama run hf.co/google/gemma-3-12b-it-qat-q4_0-gguf
Advanced Usage
There is no advanced usage content provided in the original document, so this part is skipped.
📚 Documentation
Model Information
Description
Gemma is a family of lightweight, state - of - the - art open models from Google. Built on the same technology as Gemini models, Gemma 3 models are multimodal, handling text and image input and generating text output. They have open weights for both pre - trained and instruction - tuned variants. With a large 128K context window, multilingual support in over 140 languages, and more size options than previous versions, Gemma 3 models are suitable for various text generation and image understanding tasks. Their relatively small size allows deployment in resource - limited environments, democratizing access to advanced AI models.
Inputs and outputs
Property |
Details |
Input |
- Text string, such as a question, a prompt, or a document to be summarized - Images, normalized to 896 x 896 resolution and encoded to 256 tokens each - Total input context of 128K tokens for the 4B, 12B, and 27B sizes, and 32K tokens for the 1B size |
Output |
- Generated text in response to the input, such as an answer to a question, analysis of image content, or a summary of a document - Total output context of 8192 tokens |
Model Data
Training Dataset
These models were trained on a diverse dataset of text data, including web documents, code, mathematics, and images. The 12B model was trained with 12 trillion tokens, the 4B model with 4 trillion tokens, and the 1B model with 2 trillion tokens. The combination of these data sources is crucial for training a powerful multimodal model.
Data Preprocessing
- CSAM Filtering: Rigorous CSAM filtering was applied at multiple stages to exclude harmful and illegal content.
- Sensitive Data Filtering: Automated techniques were used to filter out certain personal information and other sensitive data.
- Additional methods: Filtering based on content quality and safety in line with [our policies][safety - policies].
Implementation Information
Hardware
Gemma was trained using [Tensor Processing Unit (TPU)][tpu] hardware (TPUv4p, TPUv5p, and TPUv5e). TPUs offer advantages in performance, memory, scalability, and cost - effectiveness for training vision - language models.
Software
Training was done using [JAX][jax] and [ML Pathways][ml - pathways]. JAX allows for faster and more efficient training on modern hardware, and ML Pathways is suitable for building general - purpose AI systems.
Intended Usage
- Content Creation and Communication: Text generation, chatbots, text summarization, and image data extraction.
- Research and Education: NLP and VLM research, language learning tools, and knowledge exploration.
Limitations
- Training Data: The quality and diversity of training data can influence the model's capabilities.
- Context and Task Complexity: Models are better at well - defined tasks and can be affected by the amount of context provided.
- Language Ambiguity and Nuance: Models may struggle with subtle language nuances.
- Factual Accuracy: They may generate incorrect or outdated factual statements.
- Common Sense: Models may lack the ability to apply common sense reasoning.
Ethical Considerations and Risks
- Bias and Fairness: VLMs can reflect socio - cultural biases in the training data.
- Misinformation and Misuse: They can be misused to generate false or harmful content.
- Transparency and Accountability: The model card provides details on the model's architecture, capabilities, limitations, and evaluation processes.
Citation
@article{gemma_2025,
title={Gemma 3},
url={https://goo.gle/Gemma3Report},
publisher={Kaggle},
author={Gemma Team},
year={2025}
}
🔧 Technical Details
Model Page
Gemma
Resources and Technical Documentation
Terms of Use
[Terms][terms]
Authors
Google DeepMind
📄 License
The license is gemma. To access Gemma on Hugging Face, you’re required to review and agree to Google’s usage license. To do this, please ensure you’re logged in to Hugging Face and click the "Acknowledge license" button. Requests are processed immediately.
[kaggle - gemma]: https://www.kaggle.com/models/google/gemma - 3
[vertex - mg - gemma3]: https://console.cloud.google.com/vertex - ai/publishers/google/model - garden/gemma3
[terms]: https://ai.google.dev/gemma/terms
[safety - policies]: https://ai.google/static/documents/ai - responsibility - update - published - february - 2025.pdf
[prohibited - use]: https://ai.google.dev/gemma/prohibited_use_policy
[tpu]: https://cloud.google.com/tpu/docs/intro - to - tpu
[sustainability]: https://sustainability.google/operating - sustainably/
[jax]: https://github.com/jax - ml/jax
[ml - pathways]: https://blog.google/technology/ai/introducing - pathways - next - generation - ai - architecture/
[sustainability]: https://sustainability.google/operating - sustainably/
[gemini - 2 - paper]: https://arxiv.org/abs/2312.11805