Gemma-3-27b-it Requantize Open Source Model - Free Test of Post-quantization Performance

Gemma 3 27b It Qat Q4 0 GGUF

Developed by Mungert

This is an experimental re-quantized model created based on Google's Gemma-3-27b-it QAT Q4_0 quantized model, designed to test performance after re-quantization.

Large Language Model #QAT quantization optimization #Low-bit high performance #Code generation enhancement

Downloads 1,096

Release Time : 4/7/2025

Model Overview

This model was created by generating an imatrix file from Google's original QAT Q4_0 quantized model, then using this imatrix to recompress the model to a lower bit quantization level. Mainly used to test whether QAT models perform better than bf16 models quantized to the same bit level after re-quantization.

Model Features

Experimental re-quantization

Tests whether re-quantization from QAT Q4_0 model performs better than quantization from bf16 model.

Performance optimization

Shows lower perplexity than standard quantized models in tests (4.10 vs 4.56).

Code generation capability

Demonstrates better technical accuracy and code quality in code generation tasks.

Model Capabilities

Text generation

Code generation

Language understanding

Text conversion

Use Cases

Code generation

Security detection code generation

Generates .NET code to detect if websites use quantum-safe encryption

The generated code outperforms standard quantized models in technical accuracy, code quality, and security relevance

Language model evaluation

Perplexity testing

Used to evaluate language model perplexity performance

Shows lower perplexity than standard quantized models in tests (4.10 vs 4.56)

🚀 gemma-3-27b-it-qat-q4_0 GGUF Models

This project focuses on the requantization of the Gemma-3 model and provides in-depth testing and analysis, aiming to explore the performance differences between different quantization methods.

🚀 Quick Start

Model Evaluation

The author conducted two types of tests on the model:

Perplexity Test:
```
python3 ~/code/GGUFModelBuilder/perp_test_2_files.py ./gemma-3-4b-it-qat-q4_0-q3_k_l.gguf ./google_gemma-3-4b-it-q3_k_l.gguf
```
The test results show that the gemma-3-4b-it-qat-q4_0-q3_k_l.gguf model has a lower perplexity value, indicating better performance.
.NET Code Test: When asking the model to write .NET code to test if a website is using quantum-safe encryption, the QAT q4_0 model outperforms the BF16 model in terms of technical accuracy, code quality, security relevance, realism, and usability.

Model Usage

Here are some code snippets to quickly start running the model:

llama.cpp (text-only)

./llama-cli -hf google/gemma-3-27b-it-qat-q4_0-gguf -p "Write a poem about the Kraken."

llama.cpp (image input)

wget https://github.com/bebechien/gemma/blob/main/surprise.png?raw=true -O ~/Downloads/surprise.png
./llama-gemma3-cli -hf google/gemma-3-27b-it-qat-q4_0-gguf -p "Describe this image." --image ~/Downloads/surprise.png

ollama (text only)

ollama run hf.co/google/gemma-3-27b-it-qat-q4_0-gguf

✨ Features

Model Advantages

Multimodal Capability: Gemma 3 models can handle text and image input and generate text output, suitable for a variety of text generation and image understanding tasks.
Large Context Window: It has a large, 128K context window, which can process more input information.
Multilingual Support: Supports over 140 languages, facilitating global users.
Resource Efficiency: Its relatively small size allows it to be deployed in environments with limited resources, such as laptops, desktops, or personal cloud infrastructure.

Comparison of Different Models

QAT q4_0 Model:
- Technical Accuracy: Checks both TLS version and cipher suites, aligns with security best practices, and explicitly acknowledges limitations.
- Code Quality: Uses modern async/await patterns, separates concerns into methods, and includes robust error handling and logging.
- Security Relevance: Focuses on cipher suites and mentions the need to update cipher lists based on NIST guidelines.
- Realism: Acknowledges the complexity of quantum-safe detection and clarifies that HTTP-based checks are insufficient.
- Usability: Provides clear console output and includes a working Main method with an example URL.
BF16 Model:
- Technical Accuracy: Relies on checking for a non - standard TLS/1.3 header, contains incorrect logic, and does not align with security best practices.
- Code Quality: Uses blocking synchronous code, violates .NET best practices, and has poor code structure.
- Security Relevance: Misleadingly claims to check for a deprecated cipher mode but never implements it and fails to address cipher suites.
- Realism: Implies that checking for TLS 1.3 guarantees quantum safety, which is false.
- Usability: Fails to compile due to syntax errors and lacks meaningful output.

📦 Installation

No specific installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

# llama.cpp (text-only)
./llama-cli -hf google/gemma-3-27b-it-qat-q4_0-gguf -p "Write a poem about the Kraken."

Advanced Usage

# llama.cpp (image input)
wget https://github.com/bebechien/gemma/blob/main/surprise.png?raw=true -O ~/Downloads/surprise.png
./llama-gemma3-cli -hf google/gemma-3-27b-it-qat-q4_0-gguf -p "Describe this image." --image ~/Downloads/surprise.png

📚 Documentation

Model Information

Description

Gemma is a family of lightweight, state - of - the - art open models from Google. Built from the same research and technology used to create the Gemini models, Gemma 3 models are multimodal, handling text and image input and generating text output. They have open weights for both pre - trained variants and instruction - tuned variants. With a large, 128K context window, multilingual support in over 140 languages, and more sizes than previous versions, Gemma 3 models are well - suited for a variety of text generation and image understanding tasks, such as question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in resource - limited environments, democratizing access to state - of - the - art AI models and fostering innovation.

Inputs and outputs

Property	Details
Input	- Text string, such as a question, a prompt, or a document to be summarized - Images, normalized to 896 x 896 resolution and encoded to 256 tokens each - Total input context of 128K tokens for the 4B, 12B, and 27B sizes, and 32K tokens for the 1B size
Output	- Generated text in response to the input, such as an answer to a question, analysis of image content, or a summary of a document - Total output context of 8192 tokens

Model Data

Training Dataset

These models were trained on a diverse dataset of text data from various sources. The 27B model was trained with 14 trillion tokens, the 12B model with 12 trillion tokens, the 4B model with 4 trillion tokens, and the 1B model with 2 trillion tokens. The key components include:

Web Documents: A diverse collection of web text in over 140 languages exposes the model to a broad range of linguistic styles, topics, and vocabulary.
Code: Exposing the model to code helps it learn programming language syntax and patterns, improving its ability to generate code and understand code - related questions.
Mathematics: Training on mathematical text helps the model learn logical reasoning, symbolic representation, and address mathematical queries.
Images: A wide range of images enables the model to perform image analysis and visual data extraction tasks.

Data Preprocessing

The following key data cleaning and filtering methods were applied to the training data:

CSAM Filtering: Rigorous CSAM (Child Sexual Abuse Material) filtering was applied at multiple stages in the data preparation process to exclude harmful and illegal content.
Sensitive Data Filtering: Automated techniques were used to filter out certain personal information and other sensitive data from training sets to make Gemma pre - trained models safe and reliable.
Additional methods: Filtering based on content quality and safety in line with [our policies][safety - policies].

Implementation Information

Hardware

Gemma was trained using [Tensor Processing Unit (TPU)][tpu] hardware (TPUv4p, TPUv5p, and TPUv5e). Training vision - language models (VLMs) requires significant computational power. TPUs offer several advantages:

Performance: Specifically designed to handle the massive computations involved in training VLMs, they can speed up training considerably compared to CPUs.
Memory: Often come with large amounts of high - bandwidth memory, allowing for the handling of large models and batch sizes during training, which can lead to better model quality.
Scalability: TPU Pods (large clusters of TPUs) provide a scalable solution for handling the growing complexity of large foundation models. Training can be distributed across multiple TPU devices for faster and more efficient processing.
Cost - effectiveness: In many scenarios, TPUs can provide a more cost - effective solution for training large models compared to CPU - based infrastructure, especially considering the time and resources saved due to faster training. These advantages are aligned with [Google's commitments to operate sustainably][sustainability].

Software

Training was done using [JAX][jax] and [ML Pathways][ml - pathways]. JAX allows researchers to take advantage of the latest generation of hardware, including TPUs, for faster and more efficient training of large models. ML Pathways is Google's latest effort to build artificially intelligent systems capable of generalizing across multiple tasks, which is specially suitable for foundation models like these ones.

🔧 Technical Details

Model Evaluation Benchmarks

Reasoning and factuality

Benchmark	Metric	Gemma 3 PT 1B	Gemma 3 PT 4B	Gemma 3 PT 12B	Gemma 3 PT 27B
[HellaSwag][hellaswag]	10 - shot	62.3	77.2	84.2	85.6
[BoolQ][boolq]	0 - shot	63.2	72.3	78.8	82.4
[PIQA][piqa]	0 - shot	73.8	79.6	81.8	83.3
[SocialIQA][socialiqa]	0 - shot	48.9	51.9	53.4	54.9
[TriviaQA][triviaqa]	5 - shot	39.8	65.8	78.2	85.5
[Natural Questions][naturalq]	5 - shot	9.48	20.0	31.4	36.1
[ARC - c][arc]	25 - shot	38.4	56.2	68.9	70.6
[ARC - e][arc]	0 - shot	73.0	82.4	88.3	89.0
[WinoGrande][winogrande]	5 - shot	58.2	64.7	74.3	78.8
[BIG - Bench Hard][bbh]	few - shot	28.4	50.9	72.6	77.7
[DROP][drop]	1 - shot	42.4	60.1	72.2	77.2

STEM and code

Benchmark	Metric	Gemma 3 PT 4B	Gemma 3 PT 12B	Gemma 3 PT 27B
[MMLU][mmlu]	5 - shot	59.6	74.5	78.6
[MMLU][mmlu] (Pro COT)	5 - shot	29.2	45.3	52.2
[AGIEval][agieval]	3 - 5 - shot	42.1	57.4	66.2
[MATH][math]	4 - shot	24.2	43.3	50.0
[GSM8K][gsm8k]	8 - shot	38.4	71.0	82.6
[GPQA][gpqa]	5 - shot	15.0	25.4	24.3
[MBPP][mbpp]	3 - shot	46.0	60.4	65.6
[HumanEval][humaneval]	0 - shot	36.0	45.7	48.8

Multilingual

Benchmark	Gemma 3 PT 1B	Gemma 3 PT 4B	Gemma 3 PT 12B	Gemma 3 PT 27B
[MGSM][mgsm]	2.04	34.7	64.3	74.3
[Global - MMLU - Lite][global - mmlu - lite]	24.9	57.0	69.4	75.7
[WMT24++][wmt24pp] (ChrF)	36.7	48.4	53.9	55.7
[FloRes][flores]	29.5	39.2	46.0	48.8
[XQuAD][xquad] (all)	43.9	68.0	74.5	76.8
[ECLeKTic][eclektic]	4.69	11.0	17.2	24.4
[IndicGenBench][indicgenbench]	41.4	57.2	61.7	63.4

Multimodal

Benchmark	Gemma 3 PT 4B	Gemma 3 PT 12B	Gemma 3 PT 27B
[COCOcap][coco - cap]	102	111	116
[DocVQA][docvqa] (val)	72.8	82.3	85.6
[InfoVQA][info - vqa] (val)	44.1	54.8	59.4
[MMMU][mmmu] (pt)	39.2	50.3	56.1
[TextVQA][textvqa] (val)	58.9	66.5	68.6
[RealWorldQA][realworldqa]	45.5	52.2	53.9
[ReMI][remi]	27.3	38.5	44.8
[AI2D][ai2d]	63.2	75.2	79.0
[ChartQA][chartqa]	63.6	74.7	76.3
[VQAv2][vqav2]	63.9	71.2	72.9
[BLINK][blinkvqa]	38.0	35.9	39.6
[OKVQA][okvqa]	51.0	58.7	60.2
[TallyQA][tallyqa]	42.5	51.8	54.3
[SpatialSense VQA][ss - vqa]	50.9	60.0	59.4
[CountBenchQA][countbenchqa]	26.1	17.8	68.0

📄 License

The license of this model is Gemma. To access Gemma on Hugging Face, you’re required to review and agree to Google’s usage license. To do this, please ensure you’re logged in to Hugging Face and click the "Acknowledge license" button below. Requests are processed immediately.

Citation

@article{gemma_2025,
    title={Gemma 3},
    url={https://goo.gle/Gemma3Report},
    publisher={Kaggle},
    author={Gemma Team},
    year={2025}
}

⚠️ Important Note

This is an experimental requantization. The author wanted to test if the QAT model requantized performs better than the bf16 model quantized to the same bit level. Please leave feedback.

Both models in the tests have fundamental misunderstandings of TLS/SSL mechanics. For a production - grade solution, direct inspection of the TLS handshake (e.g., via SslStream) and support for post - quantum algorithms would be required.

💡 Usage Tip

When using the model, pay attention to the input requirements, such as text format and image resolution. Also, note that using GGUFs with Ollama via Hugging Face does not support image inputs at the moment. Please check the [docs on running gated repositories](https://huggingface.co/docs/hub/en/ollama#run - private - ggufs - from - the - hugging - face - hub).

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご