Gemma 3-27b-it-FP8-dynamic Open-source Model - Supports visual and text input, with efficient deployment by vLLM for text output

Gemma 3 27b It FP8 Dynamic

Developed by RedHatAI

This is a quantized version of google/gemma-3-27b-it. The weights are quantized using the FP8 data type. It is suitable for visual-text input and text output, and can perform inference with efficient deployment using vLLM.

Image-to-Text

Transformers

EnglishOpen Source License:Apache-2.0 #FP8 Quantization #Multimodal Inference #Efficient Deployment with vLLM

Downloads 1,608

Release Time : 4/28/2025

Model Overview

An FP8 quantized version based on the Gemma-3-27b-it model, supporting multimodal input (vision + text) and generating text output, with optimized inference efficiency.

Model Features

FP8 Quantization

Both weights and activations are quantized using the FP8 data type, significantly improving inference efficiency.

Multimodal Support

Supports joint input of vision and text, capable of understanding image content and generating relevant text.

Efficient Deployment

Optimized for the vLLM backend, supporting efficient inference deployment.

High Precision Maintenance

After quantization, the model maintains over 99% of the accuracy of the original model in multiple benchmark tests.

Model Capabilities

Image Content Understanding

Multimodal Text Generation

Visual Question Answering

Context Understanding

Use Cases

Image Understanding

Image Content Description

Generate a text description of the image content after inputting an image

Accurately identify the main elements and scenes in the image

Intelligent Question Answering

Visual Question Answering

Answer relevant questions based on the image content

Performs excellently in the MMMU and ChartQA benchmark tests

🚀 gemma-3-27b-it-FP8-Dynamic

This is a quantized version of google/gemma-3-27b-it, which offers optimized performance with specific quantization techniques and is suitable for efficient inference with vLLM.

🚀 Quick Start

The model can be quickly deployed using the vLLM backend. Refer to the Deployment section for detailed code examples.

✨ Features

Multimodal Input: Accepts vision - text as input and generates text output.
Quantization Optimization: Both weight and activation are quantized to FP8 data type, enabling efficient inference.
vLLM Compatibility: Supports deployment with vLLM >= 0.5.2 and OpenAI - compatible serving.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

This model can be deployed efficiently using the vLLM backend.

from vllm import LLM, SamplingParams
from vllm.assets.image import ImageAsset
from transformers import AutoProcessor

# Define model name once
model_name = "RedHatAI/gemma-3-27b-it-FP8-dynamic"

# Load image and processor
image = ImageAsset("cherry_blossom").pil_image.convert("RGB")
processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)

# Build multimodal prompt
chat = [
    {"role": "user", "content": [{"type": "image"}, {"type": "text", "text": "What is the content of this image?"}]},
    {"role": "assistant", "content": []}
]
prompt = processor.apply_chat_template(chat, add_generation_prompt=True)

# Initialize model
llm = LLM(model=model_name, trust_remote_code=True)

# Run inference
inputs = {"prompt": prompt, "multi_modal_data": {"image": [image]}}
outputs = llm.generate(inputs, SamplingParams(temperature=0.2, max_tokens=64))

# Display result
print("RESPONSE:", outputs[0].outputs[0].text)

vLLM also supports OpenAI - compatible serving. See the documentation for more details.

📚 Documentation

Model Overview

Property	Details
Model Type	gemma - 3 - 27b - it
Input	Vision - Text
Output	Text
Weight quantization	FP8
Activation quantization	FP8
Release Date	2/24/2025
Version	1.0
Model Developers	Neural Magic

This model was obtained by quantizing the weights of google/gemma-3-27b-it to FP8 data type, ready for inference with vLLM >= 0.5.2.

Deployment

Use with vLLM

The model can be deployed efficiently using the vLLM backend. The above code example demonstrates the basic process.

Creation

This model was created with llm - compressor by running the following code snippet as part of a multimodal announcement blog.

Model Creation Code

import requests
import torch
from PIL import Image
from transformers import AutoProcessor, Gemma3ForConditionalGeneration
from llmcompressor.transformers import oneshot
from llmcompressor.modifiers.quantization import QuantizationModifier

# Load model.
model_id = google/gemma-3-27b-it
model = Gemma3ForConditionalGeneration.from_pretrained(
    model_id, device_map="auto", torch_dtype="auto"
)
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)

# Recipe
recipe = [
    QuantizationModifier(
        targets="Linear",
        scheme="FP8_DYNAMIC",
        sequential_targets=["Gemma3DecoderLayer"],
        ignore=["re:.*lm_head", "re:vision_tower.*", "re:multi_modal_projector.*"],
    ),
]

SAVE_DIR=f"{model_id.split('/')[1]}-FP8-Dynamic"

# Perform oneshot
oneshot(
    model=model,
    recipe=recipe,
    trust_remote_code_model=True,
    output_dir=SAVE_DIR
)

Evaluation

The model was evaluated using lm_evaluation_harness for OpenLLM v1 text benchmark. The evaluations were conducted using the following commands:

Evaluation Commands

OpenLLM v1

lm_eval \
  --model vllm \
  --model_args pretrained="<model_name>",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=<n>,gpu_memory_utilization=0.8,enable_chunked_prefill=True,trust_remote_code=True,enforce_eager=True \
  --tasks openllm \
  --batch_size auto

Accuracy

Category	Metric	google/gemma-3-27b-it	RedHatAI/gemma-3-27b-it-FP8-Dynamic	Recovery (%)
OpenLLM V1	ARC Challenge	72.53%	72.70%	100.24%
OpenLLM V1	GSM8K	92.12%	91.51%	99.34%
OpenLLM V1	Hellaswag	85.78%	85.69%	99.90%
OpenLLM V1	MMLU	77.53%	77.45%	99.89%
OpenLLM V1	Truthfulqa (mc2)	62.20%	62.20%	99.99%
OpenLLM V1	Winogrande	79.40%	78.77%	99.20%
OpenLLM V1	Average Score	78.26%	78.05%	99.73%
Vision Evals	MMMU (val)	50.89%	51.00%	100.22%
Vision Evals	ChartQA	72.16%	72.16%	100.0%
Vision Evals	Average Score	61.53%	61.58%	100.11%

📄 License

This project is licensed under the Apache 2.0 License.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご