đ MedGemma
MedGemma is a collection of Gemma 3 variants trained for medical text and image comprehension, helping developers accelerate healthcare AI application development.
đ Quick Start
If you want to quickly start running the model locally on GPU, follow these steps:
- First, install the Transformers library. Gemma 3 is supported starting from transformers 4.50.0.
$ pip install -U transformers
- Here are some example code snippets to help you get started:
Basic Usage
from transformers import pipeline
from PIL import Image
import requests
import torch
pipe = pipeline(
"image-text-to-text",
model="google/medgemma-4b-pt",
torch_dtype=torch.bfloat16,
device="cuda",
)
image_url = "https://upload.wikimedia.org/wikipedia/commons/c/c8/Chest_Xray_PA_3-8-2010.png"
image = Image.open(requests.get(image_url, headers={"User-Agent": "example"}, stream=True).raw)
output = pipe(
images=image,
text="<start_of_image> findings:",
max_new_tokens=100,
)
print(output[0]["generated_text"])
Advanced Usage
from transformers import AutoProcessor, AutoModelForImageTextToText
from PIL import Image
import requests
import torch
model_id = "google/medgemma-4b-pt"
model = AutoModelForImageTextToText.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
processor = AutoProcessor.from_pretrained(model_id)
image_url = "https://upload.wikimedia.org/wikipedia/commons/c/c8/Chest_Xray_PA_3-8-2010.png"
image = Image.open(
requests.get(image_url, headers={"User-Agent": "example"}, stream=True).raw
).convert("RGB")
prompt = "<start_of_image> findings:"
inputs = processor(
text=prompt, images=image, return_tensors="pt"
).to(model.device, dtype=torch.bfloat16)
input_len = inputs["input_ids"].shape[-1]
with torch.inference_mode():
generation = model.generate(**inputs, max_new_tokens=100, do_sample=False)
generation = generation[0][input_len:]
decoded = processor.decode(generation, skip_special_tokens=True)
print(decoded)
⨠Features
- Medical Focus: MedGemma is a collection of Gemma 3 variants trained for performance on medical text and image comprehension.
- Multiple Variants: It currently comes in two variants: a 4B multimodal version and a 27B text - only version.
- High Performance: Outperforms the base Gemma 3 models across tested multimodal and text - only health benchmarks.
đĻ Installation
To use MedGemma, you need to install the Transformers library. Gemma 3 is supported starting from transformers 4.50.0.
$ pip install -U transformers
đģ Usage Examples
Basic Usage
from transformers import pipeline
from PIL import Image
import requests
import torch
pipe = pipeline(
"image-text-to-text",
model="google/medgemma-4b-pt",
torch_dtype=torch.bfloat16,
device="cuda",
)
image_url = "https://upload.wikimedia.org/wikipedia/commons/c/c8/Chest_Xray_PA_3-8-2010.png"
image = Image.open(requests.get(image_url, headers={"User-Agent": "example"}, stream=True).raw)
output = pipe(
images=image,
text="<start_of_image> findings:",
max_new_tokens=100,
)
print(output[0]["generated_text"])
Advanced Usage
from transformers import AutoProcessor, AutoModelForImageTextToText
from PIL import Image
import requests
import torch
model_id = "google/medgemma-4b-pt"
model = AutoModelForImageTextToText.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
processor = AutoProcessor.from_pretrained(model_id)
image_url = "https://upload.wikimedia.org/wikipedia/commons/c/c8/Chest_Xray_PA_3-8-2010.png"
image = Image.open(
requests.get(image_url, headers={"User-Agent": "example"}, stream=True).raw
).convert("RGB")
prompt = "<start_of_image> findings:"
inputs = processor(
text=prompt, images=image, return_tensors="pt"
).to(model.device, dtype=torch.bfloat16)
input_len = inputs["input_ids"].shape[-1]
with torch.inference_mode():
generation = model.generate(**inputs, max_new_tokens=100, do_sample=False)
generation = generation[0][input_len:]
decoded = processor.decode(generation, skip_special_tokens=True)
print(decoded)
đ Documentation
Model information
MedGemma is a collection of Gemma 3 variants trained for medical text and image comprehension. Developers can use it to accelerate building healthcare - based AI applications. It currently has two variants: a 4B multimodal version and a 27B text - only version.
Model architecture overview
The MedGemma model is built based on Gemma 3 and uses the same decoder - only transformer architecture as Gemma 3.
Technical specifications
Property |
Details |
Model Type |
Decoder - only Transformer architecture, see the Gemma 3 technical report |
Modalities |
4B: Text, vision; 27B: Text only |
Attention mechanism |
Utilizes grouped - query attention (GQA) |
Context length |
Supports long context, at least 128K tokens |
Key publication |
Coming soon |
Model created |
May 20, 2025 |
Model version |
1.0.0 |
Inputs and outputs
Input:
- Text string, such as a question or prompt
- Images, normalized to 896 x 896 resolution and encoded to 256 tokens each
- Total input length of 128K tokens
Output:
- Generated text in response to the input, such as an answer to a question, analysis of image content, or a summary of a document
- Total output length of 8192 tokens
Performance and validation
MedGemma was evaluated across a range of different multimodal classification, report generation, visual question answering, and text - based tasks.
Imaging evaluations
Task and metric |
MedGemma 4B |
Gemma 3 4B |
Medical image classification |
|
|
MIMIC CXR - Average F1 for top 5 conditions |
88.9 |
81.1 |
CheXpert CXR - Average F1 for top 5 conditions |
48.1 |
31.2 |
DermMCQA* - Accuracy |
71.8 |
42.6 |
Visual question answering |
|
|
SlakeVQA (radiology) - Tokenized F1 |
62.3 |
38.6 |
VQA - Rad** (radiology) - Tokenized F1 |
49.9 |
38.6 |
PathMCQA (histopathology, internal***) - Accuracy |
69.8 |
37.1 |
Knowledge and reasoning |
|
|
MedXpertQA (text + multimodal questions) - Accuracy |
18.8 |
16.4 |
Chest X - ray report generation
Metric |
MedGemma 4B (pre - trained) |
PaliGemma 2 3B (tuned for CXR) |
PaliGemma 2 10B (tuned for CXR) |
Chest X - ray report generation |
|
|
|
MIMIC CXR - RadGraph F1 |
29.5 |
28.8 |
29.5 |
Text evaluations
Metric |
MedGemma 27B |
Gemma 3 27B |
MedGemma 4B |
Gemma 3 4B |
MedQA (4 - op) |
89.8 (best - of - 5) 87.7 (0 - shot) |
74.9 |
64.4 |
50.7 |
MedMCQA |
74.2 |
62.6 |
55.7 |
45.4 |
PubMedQA |
76.8 |
73.4 |
73.4 |
68.4 |
MMLU Med (text only) |
87.0 |
83.3 |
70.0 |
67.2 |
MedXpertQA (text only) |
26.7 |
15.7 |
14.2 |
11.6 |
AfriMed - QA |
84.0 |
72.0 |
52.0 |
48.0 |
Ethics and safety evaluation
Evaluation approach
Our evaluation methods include structured evaluations and internal red - teaming testing of relevant content policies. These models were evaluated against categories relevant to ethics and safety, including child safety, content safety, representational harms, and general medical harms.
đ§ Technical Details
Model information
MedGemma 4B utilizes a SigLIP image encoder pre - trained on various de - identified medical data. Its LLM component is trained on diverse medical data. MedGemma 4B is available in pre - trained and instruction - tuned versions. MedGemma 27B is trained exclusively on medical text and optimized for inference - time computation.
Citation
A technical report is coming soon. In the meantime, if you publish using this model, please cite the Hugging Face model page:
@misc{medgemma-hf,
author = {Google},
title = {MedGemma Hugging Face}
howpublished = {\url{https://huggingface.co/collections/google/medgemma-release-680aade845f90bec6a3f60c4}},
year = {2025},
note = {Accessed: [Insert Date Accessed, e.g., 2025-05-20]}
}
đ License
The use of MedGemma is governed by the Health AI Developer Foundations terms of use.
â ī¸ Important Note
To access MedGemma on Hugging Face, you're required to review and agree to Health AI Developer Foundation's terms of use. To do this, please ensure you're logged in to Hugging Face and click below. Requests are processed immediately.
đĄ Usage Tip
For all MedGemma 27B results, test - time scaling is used to improve performance. If you want to use the model at scale, we recommend that you create a production version using Model Garden.