MedGemma-4b-pt Open-Source Medical Multimodal Model - Free Assistance for Medical Text and Image Understanding

Medgemma 4b Pt

Developed by google

MedGemma is a medical multimodal model optimized based on Gemma 3, specifically designed for medical text and image understanding, available in 4B and 27B versions.

Image-to-Text

Transformers

Open Source License:Other #Medical Multimodal #Clinical Image Analysis #Medical Text Generation

Downloads 1,054

Release Time : 5/19/2025

Model Overview

MedGemma is a medical AI model developed by Google, supporting multimodal understanding of medical texts and images, applicable to fields such as radiology, dermatology, and pathology.

Model Features

Medical Multimodal Understanding

Combines medical image and text analysis capabilities, supporting various medical imaging modalities such as chest X-rays and dermatology images.

Clinical Reasoning Optimization

Performs excellently in multiple medical benchmarks, with special optimization for clinical reasoning tasks.

Long Context Support

Supports context lengths of up to 128K tokens, suitable for processing complex medical documents.

Model Capabilities

Medical Image Analysis

Clinical Report Generation

Medical Question Answering

Multimodal Medical Reasoning

Medical Text Understanding

Use Cases

Radiology

Chest X-ray Report Generation

Automatically analyzes chest X-ray images and generates clinical reports.

Achieves a RadGraph F1 score of 29.5 on the MIMIC CXR dataset.

Dermatology

Dermatology Image Classification

Identifies and analyzes dermatology images.

Achieves an accuracy of 71.8% on the DermMCQA dataset.

Clinical Decision Support

Medical Question Answering System

Answers medical-related questions to support clinical decision-making.

Achieves an accuracy of 64.4% for the 4B version and 89.8% for the 27B version on the MedQA test.

🚀 MedGemma

MedGemma is a collection of Gemma 3 variants trained for medical text and image comprehension, helping developers accelerate healthcare AI application development.

🚀 Quick Start

If you want to quickly start running the model locally on GPU, follow these steps:

First, install the Transformers library. Gemma 3 is supported starting from transformers 4.50.0.

$ pip install -U transformers

Here are some example code snippets to help you get started:

Basic Usage

from transformers import pipeline
from PIL import Image
import requests
import torch

pipe = pipeline(
    "image-text-to-text",
    model="google/medgemma-4b-pt",
    torch_dtype=torch.bfloat16,
    device="cuda",
)

# Image attribution: Stillwaterising, CC0, via Wikimedia Commons
image_url = "https://upload.wikimedia.org/wikipedia/commons/c/c8/Chest_Xray_PA_3-8-2010.png"
image = Image.open(requests.get(image_url, headers={"User-Agent": "example"}, stream=True).raw)

output = pipe(
    images=image,
    text="<start_of_image> findings:",
    max_new_tokens=100,
)
print(output[0]["generated_text"])

Advanced Usage

# pip install accelerate
from transformers import AutoProcessor, AutoModelForImageTextToText
from PIL import Image
import requests
import torch

model_id = "google/medgemma-4b-pt"

model = AutoModelForImageTextToText.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
processor = AutoProcessor.from_pretrained(model_id)

# Image attribution: Stillwaterising, CC0, via Wikimedia Commons
image_url = "https://upload.wikimedia.org/wikipedia/commons/c/c8/Chest_Xray_PA_3-8-2010.png"
image = Image.open(
    requests.get(image_url, headers={"User-Agent": "example"}, stream=True).raw
).convert("RGB")

prompt = "<start_of_image> findings:"
inputs = processor(
    text=prompt, images=image, return_tensors="pt"
).to(model.device, dtype=torch.bfloat16)

input_len = inputs["input_ids"].shape[-1]

with torch.inference_mode():
    generation = model.generate(**inputs, max_new_tokens=100, do_sample=False)
    generation = generation[0][input_len:]

decoded = processor.decode(generation, skip_special_tokens=True)
print(decoded)

✨ Features

Medical Focus: MedGemma is a collection of Gemma 3 variants trained for performance on medical text and image comprehension.
Multiple Variants: It currently comes in two variants: a 4B multimodal version and a 27B text - only version.
High Performance: Outperforms the base Gemma 3 models across tested multimodal and text - only health benchmarks.

📦 Installation

To use MedGemma, you need to install the Transformers library. Gemma 3 is supported starting from transformers 4.50.0.

$ pip install -U transformers

💻 Usage Examples

Basic Usage

from transformers import pipeline
from PIL import Image
import requests
import torch

pipe = pipeline(
    "image-text-to-text",
    model="google/medgemma-4b-pt",
    torch_dtype=torch.bfloat16,
    device="cuda",
)

# Image attribution: Stillwaterising, CC0, via Wikimedia Commons
image_url = "https://upload.wikimedia.org/wikipedia/commons/c/c8/Chest_Xray_PA_3-8-2010.png"
image = Image.open(requests.get(image_url, headers={"User-Agent": "example"}, stream=True).raw)

output = pipe(
    images=image,
    text="<start_of_image> findings:",
    max_new_tokens=100,
)
print(output[0]["generated_text"])

Advanced Usage

# pip install accelerate
from transformers import AutoProcessor, AutoModelForImageTextToText
from PIL import Image
import requests
import torch

model_id = "google/medgemma-4b-pt"

model = AutoModelForImageTextToText.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
processor = AutoProcessor.from_pretrained(model_id)

# Image attribution: Stillwaterising, CC0, via Wikimedia Commons
image_url = "https://upload.wikimedia.org/wikipedia/commons/c/c8/Chest_Xray_PA_3-8-2010.png"
image = Image.open(
    requests.get(image_url, headers={"User-Agent": "example"}, stream=True).raw
).convert("RGB")

prompt = "<start_of_image> findings:"
inputs = processor(
    text=prompt, images=image, return_tensors="pt"
).to(model.device, dtype=torch.bfloat16)

input_len = inputs["input_ids"].shape[-1]

with torch.inference_mode():
    generation = model.generate(**inputs, max_new_tokens=100, do_sample=False)
    generation = generation[0][input_len:]

decoded = processor.decode(generation, skip_special_tokens=True)
print(decoded)

📚 Documentation

Model information

MedGemma is a collection of Gemma 3 variants trained for medical text and image comprehension. Developers can use it to accelerate building healthcare - based AI applications. It currently has two variants: a 4B multimodal version and a 27B text - only version.

Model architecture overview

The MedGemma model is built based on Gemma 3 and uses the same decoder - only transformer architecture as Gemma 3.

Technical specifications

Property	Details
Model Type	Decoder - only Transformer architecture, see the Gemma 3 technical report
Modalities	4B: Text, vision; 27B: Text only
Attention mechanism	Utilizes grouped - query attention (GQA)
Context length	Supports long context, at least 128K tokens
Key publication	Coming soon
Model created	May 20, 2025
Model version	1.0.0

Inputs and outputs

Input:

Text string, such as a question or prompt
Images, normalized to 896 x 896 resolution and encoded to 256 tokens each
Total input length of 128K tokens

Output:

Generated text in response to the input, such as an answer to a question, analysis of image content, or a summary of a document
Total output length of 8192 tokens

Performance and validation

MedGemma was evaluated across a range of different multimodal classification, report generation, visual question answering, and text - based tasks.

Imaging evaluations

Task and metric	MedGemma 4B	Gemma 3 4B
Medical image classification
MIMIC CXR - Average F1 for top 5 conditions	88.9	81.1
CheXpert CXR - Average F1 for top 5 conditions	48.1	31.2
DermMCQA* - Accuracy	71.8	42.6
Visual question answering
SlakeVQA (radiology) - Tokenized F1	62.3	38.6
VQA - Rad** (radiology) - Tokenized F1	49.9	38.6
PathMCQA (histopathology, internal***) - Accuracy	69.8	37.1
Knowledge and reasoning
MedXpertQA (text + multimodal questions) - Accuracy	18.8	16.4

Chest X - ray report generation

Metric	MedGemma 4B (pre - trained)	PaliGemma 2 3B (tuned for CXR)	PaliGemma 2 10B (tuned for CXR)
Chest X - ray report generation
MIMIC CXR - RadGraph F1	29.5	28.8	29.5

Text evaluations

Metric	MedGemma 27B	Gemma 3 27B	MedGemma 4B	Gemma 3 4B
MedQA (4 - op)	89.8 (best - of - 5) 87.7 (0 - shot)	74.9	64.4	50.7
MedMCQA	74.2	62.6	55.7	45.4
PubMedQA	76.8	73.4	73.4	68.4
MMLU Med (text only)	87.0	83.3	70.0	67.2
MedXpertQA (text only)	26.7	15.7	14.2	11.6
AfriMed - QA	84.0	72.0	52.0	48.0

Ethics and safety evaluation

Evaluation approach

Our evaluation methods include structured evaluations and internal red - teaming testing of relevant content policies. These models were evaluated against categories relevant to ethics and safety, including child safety, content safety, representational harms, and general medical harms.

🔧 Technical Details

Model information

MedGemma 4B utilizes a SigLIP image encoder pre - trained on various de - identified medical data. Its LLM component is trained on diverse medical data. MedGemma 4B is available in pre - trained and instruction - tuned versions. MedGemma 27B is trained exclusively on medical text and optimized for inference - time computation.

Citation

A technical report is coming soon. In the meantime, if you publish using this model, please cite the Hugging Face model page:

@misc{medgemma-hf,
    author = {Google},
    title = {MedGemma Hugging Face}
    howpublished = {\url{https://huggingface.co/collections/google/medgemma-release-680aade845f90bec6a3f60c4}},
    year = {2025},
    note = {Accessed: [Insert Date Accessed, e.g., 2025-05-20]}
}

📄 License

The use of MedGemma is governed by the Health AI Developer Foundations terms of use.

⚠️ Important Note

To access MedGemma on Hugging Face, you're required to review and agree to Health AI Developer Foundation's terms of use. To do this, please ensure you're logged in to Hugging Face and click below. Requests are processed immediately.

💡 Usage Tip

For all MedGemma 27B results, test - time scaling is used to improve performance. If you want to use the model at scale, we recommend that you create a production version using Model Garden.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご