đ MedGemma Model Card
MedGemma is a collection of models based on Gemma 3, trained for medical text and image comprehension. It helps developers build healthcare - based AI applications more efficiently.
đ Quick Start
Installation
First, install the Transformers library. Gemma 3 is supported starting from transformers 4.50.0.
$ pip install -U transformers
Basic Usage
from transformers import pipeline
import torch
pipe = pipeline(
"text-generation",
model="google/medgemma-27b-text-it",
torch_dtype=torch.bfloat16,
device="cuda",
)
messages = [
{
"role": "system",
"content": "You are a helpful medical assistant."
},
{
"role": "user",
"content": "How do you differentiate bacterial from viral pneumonia?"
}
]
output = pipe(text=messages, max_new_tokens=200)
print(output[0]["generated_text"][-1]["content"])
Advanced Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "google/medgemma-27b-text-it"
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
messages = [
{
"role": "system",
"content": "You are a helpful medical assistant."
},
{
"role": "user",
"content": "How do you differentiate bacterial from viral pneumonia?"
}
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
input_len = inputs["input_ids"].shape[-1]
with torch.inference_mode():
generation = model.generate(**inputs, max_new_tokens=200, do_sample=False)
generation = generation[0][input_len:]
decoded = tokenizer.decode(generation, skip_special_tokens=True)
print(decoded)
⨠Features
- Medical Focus: Trained for performance on medical text and image comprehension.
- Two Variants: Available in a 4B multimodal version and a 27B text - only version.
- Good Performance: Outperforms base Gemma models on text - only health benchmarks.
đ Documentation
Model Information
MedGemma is a collection of Gemma 3 variants. The 27B version is trained exclusively on medical text and optimized for inference - time computation. It's only available as an instruction - tuned model.
Model Architecture Overview
The MedGemma model is built based on Gemma 3 and uses the same decoder - only transformer architecture as Gemma 3. For more about the architecture, see the Gemma 3 model card.
Technical Specifications
Property |
Details |
Model Type |
Decoder - only Transformer architecture, see the Gemma 3 technical report |
Modalities |
4B: Text, vision; 27B: Text only |
Attention Mechanism |
Utilizes grouped - query attention (GQA) |
Context Length |
Supports long context, at least 128K tokens |
Key Publication |
Coming soon |
Model Created |
May 20, 2025 |
Model Version |
1.0.0 |
Inputs and Outputs
Input:
- Text string, such as a question or prompt.
- Total input length of 128K tokens.
Output:
- Generated text in response to the input, such as an answer to a question, analysis of image content, or a summary of a document.
- Total output length of 8192 tokens.
Performance and Validation
MedGemma was evaluated across a range of different multimodal classification, report generation, visual question answering, and text - based tasks.
Key Performance Metrics
Text Evaluations
Metric |
MedGemma 27B |
Gemma 3 27B |
MedGemma 4B |
Gemma 3 4B |
MedQA (4 - op) |
89.8 (best - of - 5) 87.7 (0 - shot) |
74.9 |
64.4 |
50.7 |
MedMCQA |
74.2 |
62.6 |
55.7 |
45.4 |
PubMedQA |
76.8 |
73.4 |
73.4 |
68.4 |
MMLU Med (text only) |
87.0 |
83.3 |
70.0 |
67.2 |
MedXpertQA (text only) |
26.7 |
15.7 |
14.2 |
11.6 |
AfriMed - QA |
84.0 |
72.0 |
52.0 |
48.0 |
For all MedGemma 27B results, test - time scaling is used to improve performance.
Ethics and Safety Evaluation
Evaluation Approach
Our evaluation methods include structured evaluations and internal red - teaming testing of relevant content policies. The models were evaluated against categories like child safety, content safety, representational harms, and general medical harms.
Evaluation Results
For all areas of safety testing, the model showed safe levels of performance. All testing was conducted without safety filters. A limitation is that evaluations mainly included English language prompts.
Data Card
Dataset Overview
Training: The base Gemma models are pre - trained on a large corpus of text and code data. MedGemma 4B uses a SigLIP image encoder pre - trained on de - identified medical data. Its LLM component is trained on diverse medical data.
Evaluation: MedGemma models have been evaluated on a comprehensive set of clinically relevant benchmarks, including over 22 datasets across 5 different tasks and 6 medical image modalities.
Source: It uses a combination of public and private datasets, including MIMIC - CXR, Slake - VQA, etc.
Data Ownership and Documentation
- [Mimic - CXR](https://physionet.org/content/mimic - cxr/2.1.0/): MIT Laboratory for Computational Physiology and Beth Israel Deaconess Medical Center (BIDMC).
- [Slake - VQA](https://www.med - vqa.com/slake/): The Hong Kong Polytechnic University
đ License
The use of MedGemma is governed by the [Health AI Developer Foundations terms of use](https://developers.google.com/health - ai - developer - foundations/terms). To access MedGemma on Hugging Face, you're required to review and agree to these terms. Please ensure you're logged in to Hugging Face and click the "Acknowledge license" button. Requests are processed immediately.
Citation
A technical report is coming soon. In the meantime, if you publish using this model, please cite the Hugging Face model page:
@misc{medgemma - hf,
author = {Google},
title = {MedGemma Hugging Face},
howpublished = {\url{https://huggingface.co/collections/google/medgemma - release - 680aade845f90bec6a3f60c4}},
year = {2025},
note = {Accessed: [Insert Date Accessed, e.g., 2025 - 05 - 20]}
}