đ Gemma-3 4B Instruct GGUF Models
This project focuses on the experimental requantization of Gemma-3 4B Instruct GGUF models. It tests whether the QAT model requantized performs better than the bf16 model quantized to the same bit level.
đ Quick Start
Experiment Setup
The author created imatrix files from the Google original QAT Q4_0 quantized model and used them to recompress the model to lower bit quants.
Test Results
The following is the test result of perplexity:
python3 ~/code/GGUFModelBuilder/perp_test_2_files.py ./gemma-3-4b-it-qat-q4_0-q3_k_l.gguf ./google_gemma-3-4b-it-q3_k_l.gguf
Testing model: gemma-3-4b-it-qat-q4_0-q3_k_l.gguf
Running: llama.cpp/llama-perplexity -m gemma-3-4b-it-qat-q4_0-q3_k_l.gguf -f perplexity_test_data.txt --ctx-size 256 --ppl-stride 32 --chunks 1 --threads 4
[â] Perplexity: 4.0963 (Time: 284.70s)
Testing model: google_gemma-3-4b-it-q3_k_l.gguf
Running: llama.cpp/llama-perplexity -m google_gemma-3-4b-it-q3_k_l.gguf -f perplexity_test_data.txt --ctx-size 256 --ppl-stride 32 --chunks 1 --threads 4
[â] Perplexity: 4.5557 (Time: 287.15s)
=== Comparison Results ===
Model 1: gemma-3-4b-it-qat-q4_0-q3_k_l.gguf - Perplexity: 4.10 (Time: 284.70s)
Model 2: google_gemma-3-4b-it-q3_k_l.gguf - Perplexity: 4.56 (Time: 287.15s)
Winner: gemma-3-4b-it-qat-q4_0-q3_k_l.gguf (Difference: 0.46)
Another Test
The author asked both models to write some .NET code to test if a website is using quantum-safe encryption and then asked Deepseek-R1 to evaluate the outputs.
Evaluation of the Two Models' Outputs
- Technical Accuracy:
- QAT q4_0 Model: Checks both TLS version and cipher suites, aligns with security best practices, and acknowledges limitations.
- BF16 Model: Relies on non - standard headers, contains incorrect logic, and has a wrong understanding of TLS retrieval.
- Code Quality:
- QAT q4_0 Model: Uses modern async/await patterns, separates concerns, and has robust error handling.
- BF16 Model: Uses blocking synchronous code and has a poor structure.
- Security Relevance:
- QAT q4_0 Model: Focuses on cipher suites and mentions NIST guidelines.
- BF16 Model: Misleadingly claims to check for a deprecated cipher mode and fails to address cipher suites.
- Realism:
- QAT q4_0 Model: Acknowledges the complexity of quantum - safe detection.
- BF16 Model: Incorrectly implies that TLS 1.3 guarantees quantum safety.
- Usability:
- QAT q4_0 Model: Provides clear console output and a working Main method.
- BF16 Model: Fails to compile due to syntax errors and lacks meaningful output.
Critical Flaws in Both Models
- Header Misuse: Both models incorrectly assume TLS version and cipher suites are exposed in HTTP headers.
- Quantum - Safe Misunderstanding: Neither code checks for post - quantum algorithms.
Final Verdict
The QAT q4_0 model's code is superior as it follows better coding practices, attempts a more relevant security analysis, and explicitly acknowledges limitations. However, both models fail to solve the original problem due to fundamental misunderstandings of TLS/SSL mechanics.
Overall Conclusion
The perplexity difference was small, and the Deepseek test produced different results on subsequent runs. So, further investigation is worth it.
đ Documentation
Original Gemma 3 model card
- Model Page: Gemma
- Resources and Technical Documentation:
- [Gemma 3 Technical Report][g3-tech-report]
- [Responsible Generative AI Toolkit][rai-toolkit]
- [Gemma on Kaggle][kaggle-gemma]
- [Gemma on Vertex Model Garden][vertex-mg-gemma3]
- Terms of Use: [Terms][terms]
- Authors: Google DeepMind
Model Information
Description
Gemma is a family of lightweight, state - of - the - art open models from Google, built from the same research and technology used to create the Gemini models. Gemma 3 models are multimodal, handling text and image input and generating text output, with open weights for both pre - trained variants and instruction - tuned variants. Gemma 3 has a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions. Gemma 3 models are well - suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as laptops, desktops or your own cloud infrastructure, democratizing access to state - of - the - art AI models and helping foster innovation for everyone.
Inputs and outputs
Property |
Details |
Input |
- Text string, such as a question, a prompt, or a document to be summarized
- Images, normalized to 896 x 896 resolution and encoded to 256 tokens each
- Total input context of 128K tokens for the 4B, 12B, and 27B sizes, and 32K tokens for the 1B size
|
Output |
- Generated text in response to the input, such as an answer to a question, analysis of image content, or a summary of a document
- Total output context of 8192 tokens
|
đ License
The license of this project is gemma.
â ī¸ Important Note
This is an experimental requantization. Please leave feedback.
đĄ Usage Tip
Further investigation is recommended due to the small test set and inconsistent results of the Deepseek test.