🚀 Gemma 3 27B Instruction-tuned INT4
This is a QAT INT4 Flax checkpoint (from Kaggle) converted to GGUF format for easy use. You can find the conversion script at GitHub. Note that this is not the same as the official QAT INT4 GGUFs released here. Below is the original Model card from Google Gemma 3 27B IT.
🚀 Quick Start
Access Gemma on Hugging Face
To access Gemma on Hugging Face, you’re required to review and agree to Google’s usage license. To do this, please ensure you’re logged in to Hugging Face and click below. Requests are processed immediately.
[Acknowledge license]
Installation
First, install the Transformers library. Gemma 3 is supported starting from transformers 4.50.0.
$ pip install -U transformers
Usage Examples
Basic Usage
You can initialize the model and processor for inference with pipeline
as follows.
from transformers import pipeline
import torch
pipe = pipeline(
"image-text-to-text",
model="google/gemma-3-27b-it",
device="cuda",
torch_dtype=torch.bfloat16
)
With instruction-tuned models, you need to use chat templates to process our inputs first. Then, you can pass it to the pipeline.
messages = [
{
"role": "system",
"content": [{"type": "text", "text": "You are a helpful assistant."}]
},
{
"role": "user",
"content": [
{"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
{"type": "text", "text": "What animal is on the candy?"}
]
}
]
output = pipe(text=messages, max_new_tokens=200)
print(output[0]["generated_text"][-1]["content"])
Advanced Usage
Running the model on a single/multi GPU
from transformers import AutoProcessor, Gemma3ForConditionalGeneration
from PIL import Image
import requests
import torch
model_id = "google/gemma-3-27b-it"
model = Gemma3ForConditionalGeneration.from_pretrained(
model_id, device_map="auto"
).eval()
processor = AutoProcessor.from_pretrained(model_id)
messages = [
{
"role": "system",
"content": [{"type": "text", "text": "You are a helpful assistant."}]
},
{
"role": "user",
"content": [
{"type": "image", "image": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg"},
{"type": "text", "text": "Describe this image in detail."}
]
}
]
inputs = processor.apply_chat_template(
messages, add_generation_prompt=True, tokenize=True,
return_dict=True, return_tensors="pt"
).to(model.device, dtype=torch.bfloat16)
input_len = inputs["input_ids"].shape[-1]
with torch.inference_mode():
generation = model.generate(**inputs, max_new_tokens=100, do_sample=False)
generation = generation[0][input_len:]
decoded = processor.decode(generation, skip_special_tokens=True)
print(decoded)
✨ Features
Model Information
- Summary: Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. Gemma 3 models are multimodal, handling text and image input and generating text output, with open weights for both pre-trained variants and instruction-tuned variants.
- Inputs and Outputs:
- Input: Text string, images (normalized to 896 x 896 resolution and encoded to 256 tokens each), with a total input context of 128K tokens for the 4B, 12B, and 27B sizes, and 32K tokens for the 1B size.
- Output: Generated text in response to the input, with a total output context of 8192 tokens.
Model Data
- Training Dataset: These models were trained on a dataset of text data that includes web documents, code, mathematics, and images. The 27B model was trained with 14 trillion tokens, the 12B model with 12 trillion tokens, the 4B model with 4 trillion tokens, and the 1B model with 2 trillion tokens.
- Data Preprocessing: Key data cleaning and filtering methods include CSAM filtering, sensitive data filtering, and additional filtering based on content quality and safety.
Implementation Information
📚 Documentation
Model Page
Gemma
Resources and Technical Documentation
Terms of Use
Terms
Authors
Google DeepMind
Citation
@article{gemma_2025,
title={Gemma 3},
url={https://goo.gle/Gemma3Report},
publisher={Kaggle},
author={Gemma Team},
year={2025}
}
🔧 Technical Details
Benchmark Results
Reasoning and factuality
Benchmark |
Metric |
Gemma 3 PT 1B |
Gemma 3 PT 4B |
Gemma 3 PT 12B |
Gemma 3 PT 27B |
HellaSwag |
10-shot |
62.3 |
77.2 |
84.2 |
85.6 |
BoolQ |
0-shot |
63.2 |
72.3 |
78.8 |
82.4 |
PIQA |
0-shot |
73.8 |
79.6 |
81.8 |
83.3 |
SocialIQA |
0-shot |
48.9 |
51.9 |
53.4 |
54.9 |
TriviaQA |
5-shot |
39.8 |
65.8 |
78.2 |
85.5 |
Natural Questions |
5-shot |
9.48 |
20.0 |
31.4 |
36.1 |
ARC-c |
25-shot |
38.4 |
56.2 |
68.9 |
70.6 |
ARC-e |
0-shot |
73.0 |
82.4 |
88.3 |
89.0 |
WinoGrande |
5-shot |
58.2 |
64.7 |
74.3 |
78.8 |
BIG-Bench Hard |
few-shot |
28.4 |
50.9 |
72.6 |
77.7 |
DROP |
1-shot |
42.4 |
60.1 |
72.2 |
77.2 |
STEM and code
Benchmark |
Metric |
Gemma 3 PT 4B |
Gemma 3 PT 12B |
Gemma 3 PT 27B |
MMLU |
5-shot |
59.6 |
74.5 |
78.6 |
MMLU (Pro COT) |
5-shot |
29.2 |
45.3 |
52.2 |
AGIEval |
3 - 5-shot |
42.1 |
57.4 |
66.2 |
MATH |
4-shot |
24.2 |
43.3 |
50.0 |
GSM8K |
8-shot |
38.4 |
71.0 |
82.6 |
GPQA |
5-shot |
15.0 |
25.4 |
24.3 |
MBPP |
3-shot |
46.0 |
60.4 |
65.6 |
HumanEval |
0-shot |
36.0 |
45.7 |
48.8 |
Multilingual
Multimodal
Ethics and Safety
- Evaluation Approach: Our evaluation methods include structured evaluations and internal red - teaming testing of relevant content policies. These models were evaluated against categories such as child safety, content safety, and representational harms.
- Evaluation Results: For all areas of safety testing, we saw major improvements relative to previous Gemma models. All testing was conducted without safety filters. A limitation was that only English language prompts were included.
Usage and Limitations
- Intended Usage: Open vision - language models (VLMs) have a wide range of applications, including content creation, chatbots, and text summarization.
- Limitations: Users should be aware of certain limitations of these models.
📄 License
The license is Gemma.