Gemma 3 27B Open-Source Instruction-Tuned Model - A Free and Lightweight Solution for Diverse Applications

Gemma 3 27b It Qat Compressed Tensors

Developed by gaunernst

Gemma 3 is a lightweight and advanced open model series launched by Google, built on the same research and technology as the Gemini model. This version is an instruction-tuned model with 27B parameters, using quantization-aware training (QAT) and compressed tensor technology.

Image-to-Text

Safetensors

#Multimodal understanding #128K long context #Quantization-aware training

Downloads 1,985

Release Time : 4/8/2025

Model Overview

Gemma 3 is a multimodal model that can process text and image inputs and generate text outputs. It is suitable for various tasks such as question answering, summarization, and reasoning. Its relatively small size enables it to be deployed in resource-constrained environments.

Model Features

Multimodal capabilities

Can process text and image inputs simultaneously and generate text outputs

Large context window

Supports a context length of 128K tokens

Quantization-aware training

Uses QAT technology to maintain performance while reducing memory requirements

Multilingual support

Supports over 140 languages

Model Capabilities

Text generation

Image analysis

Question answering system

Document summarization

Code understanding

Mathematical reasoning

Use Cases

Content generation

Poetry creation

Generate poetry based on prompts

Can generate poetry that meets the theme and style

Document summarization

Automatically generate a concise summary of a long document

Extract key information and generate a coherent summary

Image understanding

Image description

Analyze the image content and generate a description

Accurately describe the objects and scenes in the image

Education

Mathematical problem solving

Solve mathematical problems and proofs

Can handle mathematical problems of various difficulties

Programming assistance

Code generation and explanation

Can generate functional code and explain its logic

🚀 Gemma 3 27B Instruction-tuned QAT compressed-tensors

This checkpoint is converted from the original model to a specific format, enabling efficient running with vLLM.

📄 License

The license for this model is Gemma. To access Gemma on Hugging Face, you’re required to review and agree to Google’s usage license. To do this, please ensure you’re logged in to Hugging Face and click here. Requests are processed immediately.

🔖 Tags

gemma
gemma3

📦 Model Information

Property	Details
Pipeline Tag	image-text-to-text
Base Model	google/gemma-3-27b-it

🚀 Quick Start

This checkpoint was converted from google/gemma-3-27b-it-qat-q4_0-gguf to compressed-tensors format and BF16 dtype (hence, not lossless).

You can run this with vLLM using the following command:

vllm serve gaunernst/gemma-3-27b-it-qat-compressed-tensors

✨ Features

Multimodal Capability: Handles both text and image inputs, generating text outputs.
Large Context Window: Supports a 128K context window for most model sizes, enabling handling of long input sequences.
Multilingual Support: Offers support for over 140 languages.
Efficient Training: Trained with a large amount of data and optimized hardware, ensuring high performance.

💻 Usage Examples

Basic Usage

llama.cpp (text-only)

./llama-cli -hf google/gemma-3-27b-it-qat-q4_0-gguf -p "Write a poem about the Kraken."

llama.cpp (image input)

wget https://github.com/bebechien/gemma/blob/main/surprise.png?raw=true -O ~/Downloads/surprise.png
./llama-gemma3-cli -hf google/gemma-3-27b-it-qat-q4_0-gguf -p "Describe this image." --image ~/Downloads/surprise.png

ollama (text only)

Using GGUFs with Ollama via Hugging Face does not support image inputs at the moment. Please check the docs on running gated repositories.

ollama run hf.co/google/gemma-3-27b-it-qat-q4_0-gguf

📚 Documentation

Model Information

Description

Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. Gemma 3 models are multimodal, handling text and image input and generating text output, with open weights for both pre-trained variants and instruction-tuned variants.

Inputs and Outputs

Input:
- Text string, such as a question, a prompt, or a document to be summarized.
- Images, normalized to 896 x 896 resolution and encoded to 256 tokens each.
- Total input context of 128K tokens for the 4B, 12B, and 27B sizes, and 32K tokens for the 1B size.
Output:
- Generated text in response to the input, such as an answer to a question, analysis of image content, or a summary of a document.
- Total output context of 8192 tokens.

Model Data

Training Dataset

These models were trained on a dataset of text data that includes a wide variety of sources, such as web documents, code, mathematics, and images. The combination of these diverse data sources is crucial for training a powerful multimodal model.

Data Preprocessing

The training data was processed using several methods, including CSAM filtering, sensitive data filtering, and additional filtering based on content quality and safety.

Implementation Information

Hardware

Gemma was trained using Tensor Processing Unit (TPU) hardware (TPUv4p, TPUv5p and TPUv5e). TPUs offer several advantages in training vision-language models, such as performance, memory, scalability, and cost-effectiveness.

Software

Training was done using JAX and ML Pathways. These tools allow for faster and more efficient training of large models.

Evaluation

Benchmark Results

The models were evaluated against a large collection of different datasets and metrics, covering various aspects such as reasoning, STEM, code, multilingual, and multimodal capabilities.

Ethics and Safety

Evaluation Approach

Our evaluation methods include structured evaluations and internal red-teaming testing of relevant content policies. Assurance evaluations are also conducted for responsibility governance decision making.

Evaluation Results

For all areas of safety testing, we saw major improvements in child safety, content safety, and representational harms relative to previous Gemma models.

Usage and Limitations

Intended Usage

The models have a wide range of applications, including content creation, communication, research, and education.

Limitations

Users should be aware of limitations related to training data, context and task complexity, language ambiguity, and factual accuracy.

🔧 Technical Details

Model Conversion

This checkpoint was converted from google/gemma-3-27b-it-qat-q4_0-gguf to compressed-tensors format and BF16 dtype.

Quantization

The GGUF corresponds to Q4_0 quantization, which allows the model to preserve similar quality as bfloat16 while significantly reducing the memory requirements.

📄 License

The model is subject to Google's usage license. To access Gemma on Hugging Face, you’re required to review and agree to this license.

Citation

@article{gemma_2025,
    title={Gemma 3},
    url={https://goo.gle/Gemma3Report},
    publisher={Kaggle},
    author={Gemma Team},
    year={2025}
}

Resources and Technical Documentation

Terms of Use

Terms

Authors

Google DeepMind

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご