๐ MedGemma Model
MedGemma is a collection of models trained for medical text and image comprehension. It offers two variants, 4B and 27B, which can help developers accelerate the development of healthcare - based AI applications.
๐ Quick Start
Prerequisites
First, install the Transformers library. Gemma 3 is supported starting from transformers 4.50.0.
$ pip install -U transformers
Run model with the pipeline
API
from transformers import pipeline
from PIL import Image
import requests
import torch
pipe = pipeline(
"image-text-to-text",
model="google/medgemma-4b-it",
torch_dtype=torch.bfloat16,
device="cuda",
)
image_url = "https://upload.wikimedia.org/wikipedia/commons/c/c8/Chest_Xray_PA_3-8-2010.png"
image = Image.open(requests.get(image_url, headers={"User-Agent": "example"}, stream=True).raw)
messages = [
{
"role": "system",
"content": [{"type": "text", "text": "You are an expert radiologist."}]
},
{
"role": "user",
"content": [
{"type": "text", "text": "Describe this X-ray"},
{"type": "image", "image": image}
]
}
]
output = pipe(text=messages, max_new_tokens=200)
print(output[0]["generated_text"][-1]["content"])
Run the model directly
from transformers import AutoProcessor, AutoModelForImageTextToText
from PIL import Image
import requests
import torch
model_id = "google/medgemma-4b-it"
model = AutoModelForImageTextToText.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
processor = AutoProcessor.from_pretrained(model_id)
image_url = "https://upload.wikimedia.org/wikipedia/commons/c/c8/Chest_Xray_PA_3-8-2010.png"
image = Image.open(requests.get(image_url, headers={"User-Agent": "example"}, stream=True).raw)
messages = [
{
"role": "system",
"content": [{"type": "text", "text": "You are an expert radiologist."}]
},
{
"role": "user",
"content": [
{"type": "text", "text": "Describe this X-ray"},
{"type": "image", "image": image}
]
}
]
inputs = processor.apply_chat_template(
messages, add_generation_prompt=True, tokenize=True,
return_dict=True, return_tensors="pt"
).to(model.device, dtype=torch.bfloat16)
input_len = inputs["input_ids"].shape[-1]
with torch.inference_mode():
generation = model.generate(**inputs, max_new_tokens=200, do_sample=False)
generation = generation[0][input_len:]
decoded = processor.decode(generation, skip_special_tokens=True)
print(decoded)
โจ Features
- Multimodal Capability: The 4B version supports both text and vision modalities, while the 27B version focuses on text.
- High Performance: Outperforms the base Gemma 3 models across various multimodal and text - only health benchmarks.
- Long Context Support: Can handle a context length of at least 128K tokens.
๐ฆ Installation
$ pip install -U transformers
๐ป Usage Examples
Basic Usage
The above quick - start code snippets show the basic usage of running the model with the pipeline
API and running the model directly.
Advanced Usage
For more advanced usage, such as fine - tuning the model, refer to the following Colab notebooks:
- [Quick start notebook in Colab](https://colab.research.google.com/github/google - health/medgemma/blob/main/notebooks/quick_start_with_hugging_face.ipynb)
- [Fine - tuning notebook in Colab](https://colab.research.google.com/github/google - health/medgemma/blob/main/notebooks/fine_tune_with_hugging_face.ipynb)
๐ Documentation
Model information
MedGemma is a collection of Gemma 3 variants. The 4B version utilizes a SigLIP image encoder pre - trained on medical data. It has both pre - trained and instruction - tuned versions. The 27B version is trained only on medical text and is optimized for inference - time computation.
Model architecture overview
The MedGemma model is based on Gemma 3 and uses the same decoder - only transformer architecture. For more details, refer to the Gemma 3 model card.
Technical specifications
Property |
Details |
Model Type |
Decoder - only Transformer architecture, see the [Gemma 3 technical report](https://storage.googleapis.com/deepmind - media/gemma/Gemma3Report.pdf) |
Modalities |
4B: Text, vision; 27B: Text only |
Attention mechanism |
Utilizes grouped - query attention (GQA) |
Context length |
Supports long context, at least 128K tokens |
Key publication |
Coming soon |
Model created |
May 20, 2025 |
Model version |
1.0.0 |
Inputs and outputs
Input:
- Text string, such as a question or prompt
- Images, normalized to 896 x 896 resolution and encoded to 256 tokens each
- Total input length of 128K tokens
Output:
- Generated text in response to the input, such as an answer to a question, analysis of image content, or a summary of a document
- Total output length of 8192 tokens
Performance and validation
MedGemma was evaluated on various multimodal classification, report generation, visual question answering, and text - based tasks.
Imaging evaluations
Task and metric |
MedGemma 4B |
Gemma 3 4B |
Medical image classification |
|
|
MIMIC CXR - Average F1 for top 5 conditions |
88.9 |
81.1 |
CheXpert CXR - Average F1 for top 5 conditions |
48.1 |
31.2 |
DermMCQA* - Accuracy |
71.8 |
42.6 |
Visual question answering |
|
|
SlakeVQA (radiology) - Tokenized F1 |
62.3 |
38.6 |
VQA - Rad** (radiology) - Tokenized F1 |
49.9 |
38.6 |
PathMCQA (histopathology, internal***) - Accuracy |
69.8 |
37.1 |
Knowledge and reasoning |
|
|
MedXpertQA (text + multimodal questions) - Accuracy |
18.8 |
16.4 |
*Based on [ref](https://www.nature.com/articles/s41591 - 020 - 0842 - 3), presented as a 4 - way MCQ per example for skin condition classification.
**On balanced split, see ref.
***Based on multiple datasets, presented as 3 - 9 way MCQ per example for identification, grading, and subtype for breast, cervical, and prostate cancer.
Chest X - ray report generation
Metric |
MedGemma 4B (pre - trained) |
PaliGemma 2 3B (tuned for CXR) |
PaliGemma 2 10B (tuned for CXR) |
Chest X - ray report generation |
|
|
|
MIMIC CXR - RadGraph F1 |
29.5 |
28.8 |
29.5 |
Text evaluations
Metric |
MedGemma 27B |
Gemma 3 27B |
MedGemma 4B |
Gemma 3 4B |
MedQA (4 - op) |
89.8 (best - of - 5) 87.7 (0 - shot) |
74.9 |
64.4 |
50.7 |
MedMCQA |
74.2 |
62.6 |
55.7 |
45.4 |
PubMedQA |
76.8 |
73.4 |
73.4 |
68.4 |
MMLU Med (text only) |
87.0 |
83.3 |
70.0 |
67.2 |
MedXpertQA (text only) |
26.7 |
15.7 |
14.2 |
11.6 |
AfriMed - QA |
84.0 |
72.0 |
52.0 |
48.0 |
Citation
@misc{medgemma - hf,
author = {Google},
title = {MedGemma Hugging Face},
howpublished = {\url{https://huggingface.co/collections/google/medgemma - release - 680aade845f90bec6a3f60c4}},
year = {2025},
note = {Accessed: [Insert Date Accessed, e.g., 2025 - 05 - 20]}
}
๐ง Technical Details
- The model uses grouped - query attention (GQA) in its attention mechanism.
- It supports a long context length of at least 128K tokens.
๐ License
The use of MedGemma is governed by the [Health AI Developer Foundations terms of use](https://developers.google.com/health - ai - developer - foundations/terms).
To access MedGemma on Hugging Face, you're required to review and agree to [Health AI Developer Foundation's terms of use](https://developers.google.com/health - ai - developer - foundations/terms). To do this, please ensure you're logged in to Hugging Face and click below. Requests are processed immediately.
[Acknowledge license](https://huggingface.co/collections/google/medgemma - release - 680aade845f90bec6a3f60c4)
Resources
- [Model on Google Cloud Model Garden](https://console.cloud.google.com/vertex - ai/publishers/google/model - garden/medgemma)
- [Model on Hugging Face](https://huggingface.co/collections/google/medgemma - release - 680aade845f90bec6a3f60c4)
- [GitHub repository](https://github.com/google - health/medgemma)
- [Quick start notebook](https://github.com/google - health/medgemma/blob/main/notebooks/quick_start_with_hugging_face.ipynb)
- [Fine - tuning notebook](https://github.com/google - health/medgemma/blob/main/notebooks/fine_tune_with_hugging_face.ipynb)
- Patient Education Demo
- [Contact](https://developers.google.com/health - ai - developer - foundations/medgemma/get - started.md#contact)