Gemma 3 Open-Source Lightweight AI Model - Supports Multimodal Input and Text Output, Free Deployment

Gemma 3 12b It Qat Autoawq

Developed by gaunernst

Gemma 3 is Google's lightweight open model series based on Gemini technology, supporting multimodal input and text output.

Image-to-Text

Safetensors

#Multimodal Understanding #128K Long Context #Quantized Efficient Inference

Downloads 498

Release Time : 4/6/2025

Model Overview

Gemma 3 12B is an instruction-tuned multimodal model capable of processing both text and image inputs to generate text output. It employs Quantization-Aware Training (QAT) technology to reduce memory requirements while maintaining quality.

Model Features

Multimodal Capability

Can process both text and image inputs, supporting image content analysis and understanding.

Large Context Window

Supports 128K token context length, suitable for handling long documents and complex tasks.

Quantization-Aware Training

Optimized with QAT technology to reduce memory footprint while preserving model quality.

Multilingual Support

Training data includes 140+ languages, enabling cross-lingual processing capabilities.

Model Capabilities

Text generation

Image content analysis

Multilingual processing

Q&A systems

Document summarization

Code generation

Mathematical reasoning

Use Cases

Content Creation

Creative Writing

Generate creative text content such as poetry, scripts, and stories.

Can produce diverse texts matching literary styles.

Marketing Copy

Automatically generate product descriptions, ad copies, and other marketing content.

Can create engaging copy based on product features.

Research & Education

Academic Assistance

Help researchers summarize papers and explain concepts.

Can accurately distill core ideas from academic content.

Programming Assistance

Generate code examples and explain programming concepts.

Supports code generation and explanations for multiple programming languages.

Business Applications

Customer Service Chatbot

Build intelligent customer service systems to handle inquiries.

Can understand and accurately answer customer questions.

Document Processing

Automatically analyze business documents like contracts and reports.

Can extract key information and generate summaries.

🚀 Gemma 3 12B Instruction-tuned QAT AutoAWQ

This project focuses on converting the model checkpoint from the GGUF format to the AutoAWQ format and BF16 data type, enabling users to efficiently utilize the model's capabilities. The vision tower of the model is sourced from the official Google repository, ensuring high - quality visual processing.

🚀 Quick Start

This checkpoint was converted from https://huggingface.co/google/gemma-3-12b-it-qat-q4_0-gguf to AutoAWQ format and BF16 dtype (hence, not lossess). The vision tower was transplanted from https://huggingface.co/google/gemma-3-12b-it.

Below is the original model card.

✨ Features

Multimodal Capability: Handles both text and image input, generating text output.
Open - Source Weights: Both pre - trained and instruction - tuned variants have open weights.
Large Context Window: Supports a 128K context window, enabling the handling of long - form content.
Multilingual Support: Capable of processing over 140 languages.
Resource - Friendly: Can be deployed on laptops, desktops, or cloud infrastructure with limited resources.

📦 Installation

To access Gemma on Hugging Face, you’re required to review and agree to Google’s usage license. To do this, please ensure you’re logged in to Hugging Face and click the "Acknowledge license" button below. Requests are processed immediately.

💻 Usage Examples

Basic Usage

llama.cpp (text - only)

./llama-cli -hf google/gemma-3-12b-it-qat-q4_0-gguf -p "Write a poem about the Kraken."

llama.cpp (image input)

wget https://github.com/bebechien/gemma/blob/main/surprise.png?raw=true -O ~/Downloads/surprise.png
./llama-gemma3-cli -hf google/gemma-3-12b-it-qat-q4_0-gguf -p "Describe this image." --image ~/Downloads/surprise.png

ollama (text only) Using GGUFs with Ollama via Hugging Face does not support image inputs at the moment. Please check the docs on running gated repositories.

ollama run hf.co/google/gemma-3-12b-it-qat-q4_0-gguf

📚 Documentation

Model Information

Description: Gemma is a family of lightweight, state - of - the - art open models from Google, built on the same technology as Gemini. Gemma 3 models are multimodal, handling text and image input and generating text output.
Inputs and Outputs:
- Input: Text string or images (normalized to 896 x 896 resolution and encoded to 256 tokens each), with a total input context of 128K tokens for 4B, 12B, and 27B sizes, and 32K tokens for 1B size.
- Output: Generated text, with a total output context of 8192 tokens.

Model Data

Training Dataset: Trained on a diverse dataset including web documents, code, mathematics, and images. Different model sizes were trained with different amounts of tokens.
Data Preprocessing: Applied CSAM filtering, sensitive data filtering, and other content - quality and safety - based filtering methods.

Implementation Information

Hardware: Trained using Tensor Processing Unit (TPU) hardware (TPUv4p, TPUv5p, and TPUv5e), offering performance, memory, scalability, cost - effectiveness, and alignment with Google's sustainability commitments.
Software: Trained using JAX and ML Pathways, which simplify the development workflow.

Evaluation

Benchmark Results: Evaluated against various datasets and metrics in different aspects such as reasoning, STEM and code, multilingual, and multimodal tasks.
- Reasoning and factuality: Tested on benchmarks like HellaSwag, BoolQ, etc.
- STEM and code: Benchmarked on [MMLU][mmlu], [AGIEval][agieval], etc.
- Multilingual: Evaluated using [MGSM][mgsm], [Global - MMLU - Lite][global - mmlu - lite], etc.
- Multimodal: Tested on [COCOcap][coco - cap], [DocVQA][docvqa], etc.

Ethics and Safety

Evaluation Approach: Conducted structured evaluations and internal red - teaming testing, covering child safety, content safety, and representational harms. Also includes "assurance evaluations" for release decision - making.
Evaluation Results: Showed major improvements in safety categories compared to previous Gemma models, with minimal policy violations. However, evaluations were limited to English - language prompts.

Usage and Limitations

Intended Usage: Can be used for content creation, chatbots, text summarization, image data extraction, research, and education.
Limitations: Affected by training data quality and diversity, context and task complexity, language ambiguity, and factual accuracy.

🔧 Technical Details

Model Page: Gemma
Resources and Technical Documentation:
Terms of Use: Terms
Authors: Google DeepMind

📄 License

The license for this model is Gemma.

Citation

@article{gemma_2025,
    title={Gemma 3},
    url={https://goo.gle/Gemma3Report},
    publisher={Kaggle},
    author={Gemma Team},
    year={2025}
}

[naturalq]: https://github.com/google - research - datasets/natural - questions [arc]: https://arxiv.org/abs/1911.01547 [winogrande]: https://arxiv.org/abs/1907.10641 [bbh]: https://paperswithcode.com/dataset/bbh [drop]: https://arxiv.org/abs/1903.00161 [mmlu]: https://arxiv.org/abs/2009.03300 [agieval]: https://arxiv.org/abs/2304.06364 [math]: https://arxiv.org/abs/2103.03874 [gsm8k]: https://arxiv.org/abs/2110.14168 [gpqa]: https://arxiv.org/abs/2311.12022 [mbpp]: https://arxiv.org/abs/2108.07732 [humaneval]: https://arxiv.org/abs/2107.03374 [mgsm]: https://arxiv.org/abs/2210.03057 [flores]: https://arxiv.org/abs/2106.03193 [xquad]: https://arxiv.org/abs/1910.11856v3 [global - mmlu - lite]: https://huggingface.co/datasets/CohereForAI/Global - MMLU - Lite [wmt24pp]: https://arxiv.org/abs/2502.12404v1 [eclektic]: https://arxiv.org/abs/2502.21228 [indicgenbench]: https://arxiv.org/abs/2404.16816 [coco - cap]: https://cocodataset.org/#home [docvqa]: https://www.docvqa.org/ [info - vqa]: https://arxiv.org/abs/2104.12756 [mmmu]: https://arxiv.org/abs/2311.16502 [textvqa]: https://textvqa.org/ [realworldqa]: https://paperswithcode.com/dataset/realworldqa [remi]: https://arxiv.org/html/2406.09175v1 [ai2d]: https://allenai.org/data/diagrams [chartqa]: https://arxiv.org/abs/2203.10244 [vqav2]: https://visualqa.org/index.html [blinkvqa]: https://arxiv.org/abs/2404.12390 [okvqa]: https://okvqa.allenai.org/ [tallyqa]: https://arxiv.org/abs/1810.12440 [ss - vqa]: https://arxiv.org/abs/1908.02660 [countbenchqa]: https://github.com/google - research/big_vision/blob/main/big_vision/datasets/countbenchqa/ [safety - policies]: # [sustainability]: #

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご