Pubmed-clip-vit-base-patch32 Open-source Medical Model - Free Processing of Medical Images and Related Texts

Pubmed Clip Vit Base Patch32

Developed by flaviagiammarino

PubMedCLIP is a version of the CLIP model fine-tuned for the medical field, specifically designed to handle medical images and related text.

Text-to-Image EnglishOpen Source License:MIT #Medical Image Analysis #Multimodal Medical #Zero-shot Classification

Downloads 10.27k

Release Time : 6/13/2023

Model Overview

PubMedCLIP is a fine-tuned version of the CLIP model for the medical field, primarily used for medical image classification and visual question answering tasks. It is trained on the ROCO dataset and supports recognition of various medical imaging modalities and human body parts.

Model Features

Medical Domain Specialization

Fine-tuned specifically for the medical field, enabling better handling of medical images and related text.

Multimodal Support

Supports recognition of various medical imaging modalities (such as X-rays, MRI, ultrasound, etc.) and human body parts.

Trained on ROCO Dataset

Trained on the large-scale multimodal medical imaging dataset ROCO, enhancing the model's performance in the medical field.

Model Capabilities

Medical Image Classification

Visual Question Answering

Multimodal Medical Image Processing

Use Cases

Medical Image Analysis

Chest X-ray Identification

Identify and classify abnormalities in chest X-ray images.

Brain MRI Analysis

Analyze brain MRI images to identify potential lesion areas.

Abdominal CT Scan Classification

Classify abdominal CT scan images to identify different anatomical structures or lesions.

🚀 Model Card for PubMedCLIP

PubMedCLIP is a fine - tuned version of CLIP tailored for the medical domain, offering enhanced performance in medical image - text understanding.

🚀 Quick Start

PubMedCLIP is a fine - tuned version of CLIP for the medical domain.

✨ Features

Fine - tuned for the medical domain, leveraging the power of CLIP in a specialized area.
Trained on a large - scale multimodal medical imaging dataset, the ROCO dataset.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

import requests
from PIL import Image
import matplotlib.pyplot as plt

from transformers import CLIPProcessor, CLIPModel

model = CLIPModel.from_pretrained("flaviagiammarino/pubmed-clip-vit-base-patch32")
processor = CLIPProcessor.from_pretrained("flaviagiammarino/pubmed-clip-vit-base-patch32")

url = "https://huggingface.co/flaviagiammarino/pubmed-clip-vit-base-patch32/resolve/main/scripts/input.jpeg"
image = Image.open(requests.get(url, stream=True).raw)
text = ["Chest X-Ray", "Brain MRI", "Abdominal CT Scan"]

inputs = processor(text=text, images=image, return_tensors="pt", padding=True)
probs = model(**inputs).logits_per_image.softmax(dim=1).squeeze()

plt.subplots()
plt.imshow(image)
plt.title("".join([x[0] + ": " + x[1] + "\n" for x in zip(text, [format(prob, ".4%") for prob in probs])]))
plt.axis("off")
plt.tight_layout()
plt.show()

results

📚 Documentation

Model Description

PubMedCLIP was trained on the Radiology Objects in COntext (ROCO) dataset, a large - scale multimodal medical imaging dataset. The ROCO dataset includes diverse imaging modalities (such as X - Ray, MRI, ultrasound, fluoroscopy, etc.) from various human body regions (such as head, spine, chest, abdomen, etc.) captured from open - access PubMed articles.

PubMedCLIP was trained for 50 epochs with a batch size of 64 using the Adam optimizer with a learning rate of 10−5. The authors have released three different pre - trained models at this [link](https://1drv.ms/u/s!ApXgPqe9kykTgwD4Np3 - f7ODAot8?e=zLVlJ2) which use ResNet - 50, ResNet - 50x4 and ViT32 as image encoders. This repository includes only the ViT32 variant of the PubMedCLIP model.

Repository: PubMedCLIP Official GitHub Repository
Paper: Does CLIP Benefit Visual Question Answering in the Medical Domain as Much as it Does in the General Domain?

🔧 Technical Details

PubMedCLIP was trained on the ROCO dataset, a large - scale multimodal medical imaging dataset. The training was carried out for 50 epochs with a batch size of 64. The Adam optimizer was used with a learning rate of 10−5. Three different pre - trained models were released, using ResNet - 50, ResNet - 50x4 and ViT32 as image encoders, and this repository only contains the ViT32 variant.

📄 License

The authors have released the model code and pre - trained checkpoints under the MIT License.

Citation Information

@article{eslami2021does,
  title={Does clip benefit visual question answering in the medical domain as much as it does in the general domain?},
  author={Eslami, Sedigheh and de Melo, Gerard and Meinel, Christoph},
  journal={arXiv preprint arXiv:2112.13906},
  year={2021}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご