Gemma 3 27b It Qat Q4 0 GGUF
Model Overview
Model Features
Model Capabilities
Use Cases
🚀 gemma-3-27b-it-qat-q4_0 GGUF Models
This project focuses on the requantization of the Gemma-3 model and provides in-depth testing and analysis, aiming to explore the performance differences between different quantization methods.
🚀 Quick Start
Model Evaluation
The author conducted two types of tests on the model:
-
Perplexity Test:
python3 ~/code/GGUFModelBuilder/perp_test_2_files.py ./gemma-3-4b-it-qat-q4_0-q3_k_l.gguf ./google_gemma-3-4b-it-q3_k_l.gguf
The test results show that the
gemma-3-4b-it-qat-q4_0-q3_k_l.gguf
model has a lower perplexity value, indicating better performance. -
.NET Code Test: When asking the model to write .NET code to test if a website is using quantum-safe encryption, the QAT q4_0 model outperforms the BF16 model in terms of technical accuracy, code quality, security relevance, realism, and usability.
Model Usage
Here are some code snippets to quickly start running the model:
llama.cpp (text-only)
./llama-cli -hf google/gemma-3-27b-it-qat-q4_0-gguf -p "Write a poem about the Kraken."
llama.cpp (image input)
wget https://github.com/bebechien/gemma/blob/main/surprise.png?raw=true -O ~/Downloads/surprise.png
./llama-gemma3-cli -hf google/gemma-3-27b-it-qat-q4_0-gguf -p "Describe this image." --image ~/Downloads/surprise.png
ollama (text only)
ollama run hf.co/google/gemma-3-27b-it-qat-q4_0-gguf
✨ Features
Model Advantages
- Multimodal Capability: Gemma 3 models can handle text and image input and generate text output, suitable for a variety of text generation and image understanding tasks.
- Large Context Window: It has a large, 128K context window, which can process more input information.
- Multilingual Support: Supports over 140 languages, facilitating global users.
- Resource Efficiency: Its relatively small size allows it to be deployed in environments with limited resources, such as laptops, desktops, or personal cloud infrastructure.
Comparison of Different Models
- QAT q4_0 Model:
- Technical Accuracy: Checks both TLS version and cipher suites, aligns with security best practices, and explicitly acknowledges limitations.
- Code Quality: Uses modern async/await patterns, separates concerns into methods, and includes robust error handling and logging.
- Security Relevance: Focuses on cipher suites and mentions the need to update cipher lists based on NIST guidelines.
- Realism: Acknowledges the complexity of quantum-safe detection and clarifies that HTTP-based checks are insufficient.
- Usability: Provides clear console output and includes a working Main method with an example URL.
- BF16 Model:
- Technical Accuracy: Relies on checking for a non - standard TLS/1.3 header, contains incorrect logic, and does not align with security best practices.
- Code Quality: Uses blocking synchronous code, violates .NET best practices, and has poor code structure.
- Security Relevance: Misleadingly claims to check for a deprecated cipher mode but never implements it and fails to address cipher suites.
- Realism: Implies that checking for TLS 1.3 guarantees quantum safety, which is false.
- Usability: Fails to compile due to syntax errors and lacks meaningful output.
📦 Installation
No specific installation steps are provided in the original document, so this section is skipped.
💻 Usage Examples
Basic Usage
# llama.cpp (text-only)
./llama-cli -hf google/gemma-3-27b-it-qat-q4_0-gguf -p "Write a poem about the Kraken."
Advanced Usage
# llama.cpp (image input)
wget https://github.com/bebechien/gemma/blob/main/surprise.png?raw=true -O ~/Downloads/surprise.png
./llama-gemma3-cli -hf google/gemma-3-27b-it-qat-q4_0-gguf -p "Describe this image." --image ~/Downloads/surprise.png
📚 Documentation
Model Information
Description
Gemma is a family of lightweight, state - of - the - art open models from Google. Built from the same research and technology used to create the Gemini models, Gemma 3 models are multimodal, handling text and image input and generating text output. They have open weights for both pre - trained variants and instruction - tuned variants. With a large, 128K context window, multilingual support in over 140 languages, and more sizes than previous versions, Gemma 3 models are well - suited for a variety of text generation and image understanding tasks, such as question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in resource - limited environments, democratizing access to state - of - the - art AI models and fostering innovation.
Inputs and outputs
Property | Details |
---|---|
Input | - Text string, such as a question, a prompt, or a document to be summarized - Images, normalized to 896 x 896 resolution and encoded to 256 tokens each - Total input context of 128K tokens for the 4B, 12B, and 27B sizes, and 32K tokens for the 1B size |
Output | - Generated text in response to the input, such as an answer to a question, analysis of image content, or a summary of a document - Total output context of 8192 tokens |
Model Data
Training Dataset
These models were trained on a diverse dataset of text data from various sources. The 27B model was trained with 14 trillion tokens, the 12B model with 12 trillion tokens, the 4B model with 4 trillion tokens, and the 1B model with 2 trillion tokens. The key components include:
- Web Documents: A diverse collection of web text in over 140 languages exposes the model to a broad range of linguistic styles, topics, and vocabulary.
- Code: Exposing the model to code helps it learn programming language syntax and patterns, improving its ability to generate code and understand code - related questions.
- Mathematics: Training on mathematical text helps the model learn logical reasoning, symbolic representation, and address mathematical queries.
- Images: A wide range of images enables the model to perform image analysis and visual data extraction tasks.
Data Preprocessing
The following key data cleaning and filtering methods were applied to the training data:
- CSAM Filtering: Rigorous CSAM (Child Sexual Abuse Material) filtering was applied at multiple stages in the data preparation process to exclude harmful and illegal content.
- Sensitive Data Filtering: Automated techniques were used to filter out certain personal information and other sensitive data from training sets to make Gemma pre - trained models safe and reliable.
- Additional methods: Filtering based on content quality and safety in line with [our policies][safety - policies].
Implementation Information
Hardware
Gemma was trained using [Tensor Processing Unit (TPU)][tpu] hardware (TPUv4p, TPUv5p, and TPUv5e). Training vision - language models (VLMs) requires significant computational power. TPUs offer several advantages:
- Performance: Specifically designed to handle the massive computations involved in training VLMs, they can speed up training considerably compared to CPUs.
- Memory: Often come with large amounts of high - bandwidth memory, allowing for the handling of large models and batch sizes during training, which can lead to better model quality.
- Scalability: TPU Pods (large clusters of TPUs) provide a scalable solution for handling the growing complexity of large foundation models. Training can be distributed across multiple TPU devices for faster and more efficient processing.
- Cost - effectiveness: In many scenarios, TPUs can provide a more cost - effective solution for training large models compared to CPU - based infrastructure, especially considering the time and resources saved due to faster training. These advantages are aligned with [Google's commitments to operate sustainably][sustainability].
Software
Training was done using [JAX][jax] and [ML Pathways][ml - pathways]. JAX allows researchers to take advantage of the latest generation of hardware, including TPUs, for faster and more efficient training of large models. ML Pathways is Google's latest effort to build artificially intelligent systems capable of generalizing across multiple tasks, which is specially suitable for foundation models like these ones.
🔧 Technical Details
Model Evaluation Benchmarks
Reasoning and factuality
Benchmark | Metric | Gemma 3 PT 1B | Gemma 3 PT 4B | Gemma 3 PT 12B | Gemma 3 PT 27B |
---|---|---|---|---|---|
[HellaSwag][hellaswag] | 10 - shot | 62.3 | 77.2 | 84.2 | 85.6 |
[BoolQ][boolq] | 0 - shot | 63.2 | 72.3 | 78.8 | 82.4 |
[PIQA][piqa] | 0 - shot | 73.8 | 79.6 | 81.8 | 83.3 |
[SocialIQA][socialiqa] | 0 - shot | 48.9 | 51.9 | 53.4 | 54.9 |
[TriviaQA][triviaqa] | 5 - shot | 39.8 | 65.8 | 78.2 | 85.5 |
[Natural Questions][naturalq] | 5 - shot | 9.48 | 20.0 | 31.4 | 36.1 |
[ARC - c][arc] | 25 - shot | 38.4 | 56.2 | 68.9 | 70.6 |
[ARC - e][arc] | 0 - shot | 73.0 | 82.4 | 88.3 | 89.0 |
[WinoGrande][winogrande] | 5 - shot | 58.2 | 64.7 | 74.3 | 78.8 |
[BIG - Bench Hard][bbh] | few - shot | 28.4 | 50.9 | 72.6 | 77.7 |
[DROP][drop] | 1 - shot | 42.4 | 60.1 | 72.2 | 77.2 |
STEM and code
Benchmark | Metric | Gemma 3 PT 4B | Gemma 3 PT 12B | Gemma 3 PT 27B |
---|---|---|---|---|
[MMLU][mmlu] | 5 - shot | 59.6 | 74.5 | 78.6 |
[MMLU][mmlu] (Pro COT) | 5 - shot | 29.2 | 45.3 | 52.2 |
[AGIEval][agieval] | 3 - 5 - shot | 42.1 | 57.4 | 66.2 |
[MATH][math] | 4 - shot | 24.2 | 43.3 | 50.0 |
[GSM8K][gsm8k] | 8 - shot | 38.4 | 71.0 | 82.6 |
[GPQA][gpqa] | 5 - shot | 15.0 | 25.4 | 24.3 |
[MBPP][mbpp] | 3 - shot | 46.0 | 60.4 | 65.6 |
[HumanEval][humaneval] | 0 - shot | 36.0 | 45.7 | 48.8 |
Multilingual
Benchmark | Gemma 3 PT 1B | Gemma 3 PT 4B | Gemma 3 PT 12B | Gemma 3 PT 27B |
---|---|---|---|---|
[MGSM][mgsm] | 2.04 | 34.7 | 64.3 | 74.3 |
[Global - MMLU - Lite][global - mmlu - lite] | 24.9 | 57.0 | 69.4 | 75.7 |
[WMT24++][wmt24pp] (ChrF) | 36.7 | 48.4 | 53.9 | 55.7 |
[FloRes][flores] | 29.5 | 39.2 | 46.0 | 48.8 |
[XQuAD][xquad] (all) | 43.9 | 68.0 | 74.5 | 76.8 |
[ECLeKTic][eclektic] | 4.69 | 11.0 | 17.2 | 24.4 |
[IndicGenBench][indicgenbench] | 41.4 | 57.2 | 61.7 | 63.4 |
Multimodal
Benchmark | Gemma 3 PT 4B | Gemma 3 PT 12B | Gemma 3 PT 27B |
---|---|---|---|
[COCOcap][coco - cap] | 102 | 111 | 116 |
[DocVQA][docvqa] (val) | 72.8 | 82.3 | 85.6 |
[InfoVQA][info - vqa] (val) | 44.1 | 54.8 | 59.4 |
[MMMU][mmmu] (pt) | 39.2 | 50.3 | 56.1 |
[TextVQA][textvqa] (val) | 58.9 | 66.5 | 68.6 |
[RealWorldQA][realworldqa] | 45.5 | 52.2 | 53.9 |
[ReMI][remi] | 27.3 | 38.5 | 44.8 |
[AI2D][ai2d] | 63.2 | 75.2 | 79.0 |
[ChartQA][chartqa] | 63.6 | 74.7 | 76.3 |
[VQAv2][vqav2] | 63.9 | 71.2 | 72.9 |
[BLINK][blinkvqa] | 38.0 | 35.9 | 39.6 |
[OKVQA][okvqa] | 51.0 | 58.7 | 60.2 |
[TallyQA][tallyqa] | 42.5 | 51.8 | 54.3 |
[SpatialSense VQA][ss - vqa] | 50.9 | 60.0 | 59.4 |
[CountBenchQA][countbenchqa] | 26.1 | 17.8 | 68.0 |
📄 License
The license of this model is Gemma. To access Gemma on Hugging Face, you’re required to review and agree to Google’s usage license. To do this, please ensure you’re logged in to Hugging Face and click the "Acknowledge license" button below. Requests are processed immediately.
Citation
@article{gemma_2025,
title={Gemma 3},
url={https://goo.gle/Gemma3Report},
publisher={Kaggle},
author={Gemma Team},
year={2025}
}
⚠️ Important Note
This is an experimental requantization. The author wanted to test if the QAT model requantized performs better than the bf16 model quantized to the same bit level. Please leave feedback.
Both models in the tests have fundamental misunderstandings of TLS/SSL mechanics. For a production - grade solution, direct inspection of the TLS handshake (e.g., via SslStream) and support for post - quantum algorithms would be required.
💡 Usage Tip
When using the model, pay attention to the input requirements, such as text format and image resolution. Also, note that using GGUFs with Ollama via Hugging Face does not support image inputs at the moment. Please check the [docs on running gated repositories](https://huggingface.co/docs/hub/en/ollama#run - private - ggufs - from - the - hugging - face - hub).

