🚀 Gemma 3 27B Instruction-tuned QAT compressed-tensors
This checkpoint is converted from the original model to a specific format, enabling efficient running with vLLM.
📄 License
The license for this model is Gemma. To access Gemma on Hugging Face, you’re required to review and agree to Google’s usage license. To do this, please ensure you’re logged in to Hugging Face and click here. Requests are processed immediately.
🔖 Tags
📦 Model Information
Property |
Details |
Pipeline Tag |
image-text-to-text |
Base Model |
google/gemma-3-27b-it |
🚀 Quick Start
This checkpoint was converted from google/gemma-3-27b-it-qat-q4_0-gguf to compressed-tensors format and BF16 dtype (hence, not lossless).
You can run this with vLLM using the following command:
vllm serve gaunernst/gemma-3-27b-it-qat-compressed-tensors
✨ Features
- Multimodal Capability: Handles both text and image inputs, generating text outputs.
- Large Context Window: Supports a 128K context window for most model sizes, enabling handling of long input sequences.
- Multilingual Support: Offers support for over 140 languages.
- Efficient Training: Trained with a large amount of data and optimized hardware, ensuring high performance.
💻 Usage Examples
Basic Usage
llama.cpp (text-only)
./llama-cli -hf google/gemma-3-27b-it-qat-q4_0-gguf -p "Write a poem about the Kraken."
llama.cpp (image input)
wget https://github.com/bebechien/gemma/blob/main/surprise.png?raw=true -O ~/Downloads/surprise.png
./llama-gemma3-cli -hf google/gemma-3-27b-it-qat-q4_0-gguf -p "Describe this image." --image ~/Downloads/surprise.png
ollama (text only)
Using GGUFs with Ollama via Hugging Face does not support image inputs at the moment. Please check the docs on running gated repositories.
ollama run hf.co/google/gemma-3-27b-it-qat-q4_0-gguf
📚 Documentation
Model Information
Description
Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. Gemma 3 models are multimodal, handling text and image input and generating text output, with open weights for both pre-trained variants and instruction-tuned variants.
Inputs and Outputs
- Input:
- Text string, such as a question, a prompt, or a document to be summarized.
- Images, normalized to 896 x 896 resolution and encoded to 256 tokens each.
- Total input context of 128K tokens for the 4B, 12B, and 27B sizes, and 32K tokens for the 1B size.
- Output:
- Generated text in response to the input, such as an answer to a question, analysis of image content, or a summary of a document.
- Total output context of 8192 tokens.
Model Data
Training Dataset
These models were trained on a dataset of text data that includes a wide variety of sources, such as web documents, code, mathematics, and images. The combination of these diverse data sources is crucial for training a powerful multimodal model.
Data Preprocessing
The training data was processed using several methods, including CSAM filtering, sensitive data filtering, and additional filtering based on content quality and safety.
Implementation Information
Hardware
Gemma was trained using Tensor Processing Unit (TPU) hardware (TPUv4p, TPUv5p and TPUv5e). TPUs offer several advantages in training vision-language models, such as performance, memory, scalability, and cost-effectiveness.
Software
Training was done using JAX and ML Pathways. These tools allow for faster and more efficient training of large models.
Evaluation
Benchmark Results
The models were evaluated against a large collection of different datasets and metrics, covering various aspects such as reasoning, STEM, code, multilingual, and multimodal capabilities.
Ethics and Safety
Evaluation Approach
Our evaluation methods include structured evaluations and internal red-teaming testing of relevant content policies. Assurance evaluations are also conducted for responsibility governance decision making.
Evaluation Results
For all areas of safety testing, we saw major improvements in child safety, content safety, and representational harms relative to previous Gemma models.
Usage and Limitations
Intended Usage
The models have a wide range of applications, including content creation, communication, research, and education.
Limitations
Users should be aware of limitations related to training data, context and task complexity, language ambiguity, and factual accuracy.
🔧 Technical Details
Model Conversion
This checkpoint was converted from google/gemma-3-27b-it-qat-q4_0-gguf to compressed-tensors format and BF16 dtype.
Quantization
The GGUF corresponds to Q4_0 quantization, which allows the model to preserve similar quality as bfloat16
while significantly reducing the memory requirements.
📄 License
The model is subject to Google's usage license. To access Gemma on Hugging Face, you’re required to review and agree to this license.
Citation
@article{gemma_2025,
title={Gemma 3},
url={https://goo.gle/Gemma3Report},
publisher={Kaggle},
author={Gemma Team},
year={2025}
}
Resources and Technical Documentation
Terms of Use
Terms
Authors
Google DeepMind