Open-source Gemma 3 - 12B Model! Supports Image and Text Input, Over 140 Languages, Enjoy Conversations with a Large Window

Gemma 3 12b It Qat Q4 0 GGUF

Developed by Mungert

Gemma is a lightweight, cutting-edge open model series from Google, built on Gemini technology. The 12B version is a multimodal model supporting text and image input, featuring a 128K large context window and support for over 140 languages.

Image-to-Text #Multimodal understanding #128K long context #Quantization-aware training

Downloads 1,008

Release Time : 4/7/2025

Model Overview

The Gemma 3 model is multimodal, capable of processing text and image inputs to generate text outputs. Suitable for various tasks like Q&A, summarization, and reasoning, its compact size facilitates deployment in resource-constrained environments.

Model Features

Quantization-aware training

Uses QAT quantization technology to reduce memory requirements while maintaining model quality

Multimodal processing

Supports text and image input (896x896 resolution) to generate text output

Large context window

The 12B version supports input contexts of up to 128K tokens

Multilingual support

Supports processing in over 140 languages

Model Capabilities

Text generation

Image content analysis

Multilingual processing

Code generation

Mathematical reasoning

Use Cases

Content creation

Poetry generation

Generates poetry based on prompts

Can produce creative text aligned with the theme

Technical document summarization

Automatically generates concise summaries of long documents

Extracts key information and produces readable summaries

Code assistance

Code generation

Generates functional code based on descriptions

Can generate basic functional code in languages like .NET

Security analysis

Analyzes code security

Identifies potential security risks (e.g., quantum-safe encryption detection)

Image understanding

Image captioning

Generates textual descriptions of images

Accurately describes image content and scenes

🚀 Gemma-3 12B Instruct GGUF Models

Gemma-3 12B Instruct GGUF Models are experimental requantized models. The goal is to test if the QAT model requantized performs better than the bf16 model quantized to the same bit level.

🚀 Quick Start

Model Comparison Test

The author has conducted tests on the 4b model, including one quantized from bf16 and another requantized from the QAT Q4_0 model. Both were quantized with the same tensor quants.

python3 ~/code/GGUFModelBuilder/perp_test_2_files.py ./gemma-3-4b-it-qat-q4_0-q3_k_l.gguf ./google_gemma-3-4b-it-q3_k_l.gguf 

Testing model: gemma-3-4b-it-qat-q4_0-q3_k_l.gguf
Running: llama.cpp/llama-perplexity -m gemma-3-4b-it-qat-q4_0-q3_k_l.gguf -f perplexity_test_data.txt --ctx-size 256 --ppl-stride 32 --chunks 1 --threads 4
[✓] Perplexity: 4.0963 (Time: 284.70s)

Testing model: google_gemma-3-4b-it-q3_k_l.gguf
Running: llama.cpp/llama-perplexity -m google_gemma-3-4b-it-q3_k_l.gguf -f perplexity_test_data.txt --ctx-size 256 --ppl-stride 32 --chunks 1 --threads 4
[✓] Perplexity: 4.5557 (Time: 287.15s)

=== Comparison Results ===
Model 1: gemma-3-4b-it-qat-q4_0-q3_k_l.gguf - Perplexity: 4.10 (Time: 284.70s)
Model 2: google_gemma-3-4b-it-q3_k_l.gguf - Perplexity: 4.56 (Time: 287.15s)

Winner: gemma-3-4b-it-qat-q4_0-q3_k_l.gguf (Difference: 0.46)

Code Functionality Comparison

The author also compared the code functionality of the two models when asked to write some .net code to test if a website is using quantum safe encryption.

Technical Accuracy

QAT q4_0 Model: Checks both TLS version and cipher suites, which are critical for assessing quantum resistance. It also explicitly acknowledges limitations.
BF16 Model: Relies on checking for a non - standard TLS/1.3 header, which does not exist in HTTP responses. It contains incorrect logic.

Code Quality

QAT q4_0 Model: Uses modern async/await patterns for non - blocking I/O, separates concerns into methods, and includes robust error handling and logging.
BF16 Model: Uses blocking synchronous code, which violates .NET best practices and risks deadlocks. It has a poor structure.

Security Relevance

QAT q4_0 Model: Focuses on cipher suites and mentions the need to update cipher lists based on NIST guidelines.
BF16 Model: Misleadingly claims to check for "AES - 256 - CBC" but never implements it and fails to address cipher suites.

Realism

QAT q4_0 Model: Acknowledges the complexity of quantum - safe detection and clarifies that HTTP - based checks are insufficient.
BF16 Model: Implies that checking for TLS 1.3 guarantees quantum safety, which is false.

Usability

QAT q4_0 Model: Provides clear console output and includes a working Main method with an example URL.
BF16 Model: Fails to compile due to syntax errors and lacks meaningful output.

Critical Flaws in Both Models

Header Misuse: Both models incorrectly assume TLS version and cipher suites are exposed in HTTP headers.
Quantum - Safe Misunderstanding: Neither code checks for post - quantum algorithms, so both models provide false positives.

Final Verdict

The QAT q4_0 model's code is superior, but both models fail to solve the original problem due to fundamental misunderstandings of TLS/SSL mechanics. Further investigation is needed.

✨ Features

Multimodal Capability: Gemma 3 models can handle text and image input and generate text output.
Large Context Window: It has a large, 128K context window (32K for the 1B size), enabling it to handle long - form input.
Multilingual Support: Supports over 140 languages.
Open - Weighted: Both pre - trained and instruction - tuned variants have open weights.
Suitable for Limited Resources: Its relatively small size allows deployment in environments with limited resources.

📦 Installation

There is no specific installation content provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

llama.cpp (text - only)

./llama-cli -hf google/gemma-3-27b-it-qat-q4_0-gguf -p "Write a poem about the Kraken."

llama.cpp (image input)

wget https://github.com/bebechien/gemma/blob/main/surprise.png?raw=true -O ~/Downloads/surprise.png
./llama-gemma3-cli -hf google/gemma-3-12b-it-qat-q4_0-gguf -p "Describe this image." --image ~/Downloads/surprise.png

ollama (text only)

ollama run hf.co/google/gemma-3-12b-it-qat-q4_0-gguf

Advanced Usage

There is no advanced usage content provided in the original document, so this part is skipped.

📚 Documentation

Model Information

Description

Gemma is a family of lightweight, state - of - the - art open models from Google. Built on the same technology as Gemini models, Gemma 3 models are multimodal, handling text and image input and generating text output. They have open weights for both pre - trained and instruction - tuned variants. With a large 128K context window, multilingual support in over 140 languages, and more size options than previous versions, Gemma 3 models are suitable for various text generation and image understanding tasks. Their relatively small size allows deployment in resource - limited environments, democratizing access to advanced AI models.

Inputs and outputs

Property	Details
Input	- Text string, such as a question, a prompt, or a document to be summarized - Images, normalized to 896 x 896 resolution and encoded to 256 tokens each - Total input context of 128K tokens for the 4B, 12B, and 27B sizes, and 32K tokens for the 1B size
Output	- Generated text in response to the input, such as an answer to a question, analysis of image content, or a summary of a document - Total output context of 8192 tokens

Model Data

Training Dataset

These models were trained on a diverse dataset of text data, including web documents, code, mathematics, and images. The 12B model was trained with 12 trillion tokens, the 4B model with 4 trillion tokens, and the 1B model with 2 trillion tokens. The combination of these data sources is crucial for training a powerful multimodal model.

Data Preprocessing

CSAM Filtering: Rigorous CSAM filtering was applied at multiple stages to exclude harmful and illegal content.
Sensitive Data Filtering: Automated techniques were used to filter out certain personal information and other sensitive data.
Additional methods: Filtering based on content quality and safety in line with [our policies][safety - policies].

Implementation Information

Hardware

Gemma was trained using [Tensor Processing Unit (TPU)][tpu] hardware (TPUv4p, TPUv5p, and TPUv5e). TPUs offer advantages in performance, memory, scalability, and cost - effectiveness for training vision - language models.

Software

Training was done using [JAX][jax] and [ML Pathways][ml - pathways]. JAX allows for faster and more efficient training on modern hardware, and ML Pathways is suitable for building general - purpose AI systems.

Intended Usage

Content Creation and Communication: Text generation, chatbots, text summarization, and image data extraction.
Research and Education: NLP and VLM research, language learning tools, and knowledge exploration.

Limitations

Training Data: The quality and diversity of training data can influence the model's capabilities.
Context and Task Complexity: Models are better at well - defined tasks and can be affected by the amount of context provided.
Language Ambiguity and Nuance: Models may struggle with subtle language nuances.
Factual Accuracy: They may generate incorrect or outdated factual statements.
Common Sense: Models may lack the ability to apply common sense reasoning.

Ethical Considerations and Risks

Bias and Fairness: VLMs can reflect socio - cultural biases in the training data.
Misinformation and Misuse: They can be misused to generate false or harmful content.
Transparency and Accountability: The model card provides details on the model's architecture, capabilities, limitations, and evaluation processes.

Citation

@article{gemma_2025,
    title={Gemma 3},
    url={https://goo.gle/Gemma3Report},
    publisher={Kaggle},
    author={Gemma Team},
    year={2025}
}

🔧 Technical Details

Model Page

Gemma

Resources and Technical Documentation

Gemma 3 Technical Report
Responsible Generative AI Toolkit
[Gemma on Kaggle][kaggle - gemma]
[Gemma on Vertex Model Garden][vertex - mg - gemma3]

Terms of Use

[Terms][terms]

Authors

Google DeepMind

📄 License

The license is gemma. To access Gemma on Hugging Face, you’re required to review and agree to Google’s usage license. To do this, please ensure you’re logged in to Hugging Face and click the "Acknowledge license" button. Requests are processed immediately.

[kaggle - gemma]: https://www.kaggle.com/models/google/gemma - 3 [vertex - mg - gemma3]: https://console.cloud.google.com/vertex - ai/publishers/google/model - garden/gemma3 [terms]: https://ai.google.dev/gemma/terms [safety - policies]: https://ai.google/static/documents/ai - responsibility - update - published - february - 2025.pdf [prohibited - use]: https://ai.google.dev/gemma/prohibited_use_policy [tpu]: https://cloud.google.com/tpu/docs/intro - to - tpu [sustainability]: https://sustainability.google/operating - sustainably/ [jax]: https://github.com/jax - ml/jax [ml - pathways]: https://blog.google/technology/ai/introducing - pathways - next - generation - ai - architecture/ [sustainability]: https://sustainability.google/operating - sustainably/ [gemini - 2 - paper]: https://arxiv.org/abs/2312.11805

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご