🚀 MedCSP_clip Model Card
This project provides a demonstration of using CLIP for encoding, which is useful for zero - shot image classification tasks.
🚀 Quick Start
Here is a demo of how to utilize the CLIP for encoding:
from open_clip import create_model_from_pretrained, get_tokenizer
import torch
from urllib.request import urlopen
from PIL import Image
model, processor = create_model_from_pretrained('hf-hub:xcwangpsu/MedCSP_clip')
tokenizer = get_tokenizer('hf-hub:xcwangpsu/MedCSP_clip')
image = Image.open(urlopen("https://huggingface.co/xcwangpsu/MedCSP_clip/resolve/main/image_sample.jpg"))
processed_image = processor(image)
processed_image = torch.unsqueeze(processed_image, 0)
print("Input size:", processed_image.shape)
image_embedding = model.encode_image(processed_image)
print("Individual image embedding size:",image_embedding.shape)
seq_image_embedding = model.visual.trunk.forward_features(processed_image)
print("Sequential image embedding size:",seq_image_embedding.shape)
text = "Chest X-ray reveals increased lung opacity, indicating potential fluid buildup or infection."
tokens = tokenizer(text)
text_embedding = model.encode_text(tokens)
print("Individual text embedding size:",text_embedding.shape)
seq_text_embedding = model.text.transformer(tokens, output_hidden_states=True).hidden_states[-1]
print("Sequential text embedding size:", seq_text_embedding.shape)
💻 Usage Examples
Basic Usage
from open_clip import create_model_from_pretrained, get_tokenizer
import torch
from urllib.request import urlopen
from PIL import Image
model, processor = create_model_from_pretrained('hf-hub:xcwangpsu/MedCSP_clip')
tokenizer = get_tokenizer('hf-hub:xcwangpsu/MedCSP_clip')
image = Image.open(urlopen("https://huggingface.co/xcwangpsu/MedCSP_clip/resolve/main/image_sample.jpg"))
processed_image = processor(image)
processed_image = torch.unsqueeze(processed_image, 0)
print("Input size:", processed_image.shape)
image_embedding = model.encode_image(processed_image)
print("Individual image embedding size:",image_embedding.shape)
seq_image_embedding = model.visual.trunk.forward_features(processed_image)
print("Sequential image embedding size:",seq_image_embedding.shape)
text = "Chest X-ray reveals increased lung opacity, indicating potential fluid buildup or infection."
tokens = tokenizer(text)
text_embedding = model.encode_text(tokens)
print("Individual text embedding size:",text_embedding.shape)
seq_text_embedding = model.text.transformer(tokens, output_hidden_states=True).hidden_states[-1]
print("Sequential text embedding size:", seq_text_embedding.shape)
Advanced Usage
from open_clip import create_model_from_pretrained, get_tokenizer
import torch
from urllib.request import urlopen
from PIL import Image
model, processor = create_model_from_pretrained('hf-hub:xcwangpsu/MedCSP_clip')
tokenizer = get_tokenizer('hf-hub:xcwangpsu/MedCSP_clip')
image = Image.open(urlopen("https://huggingface.co/xcwangpsu/MedCSP_clip/resolve/main/image_sample.jpg"))
processed_image = processor(image)
processed_image = torch.unsqueeze(processed_image, 0)
image_embedding = model.encode_image(processed_image)
text = "Chest X-ray reveals increased lung opacity, indicating potential fluid buildup or infection."
tokens = tokenizer(text)
text_embedding = model.encode_text(tokens)
cos_sim = torch.nn.functional.cosine_similarity(image_embedding, text_embedding)
print("Cosine similarity:", cos_sim)
📄 License
This project is licensed under the MIT license.
Acknowledgement
If you find any sources provided in this repo or our paper are useful, please cite our paper using this BibTex:
@inproceedings{wang2024unity,
title={Unity in Diversity: Collaborative Pre-training Across Multimodal Medical Sources},
author={Wang, Xiaochen and Luo, Junyu and Wang, Jiaqi and Zhong, Yuan and Zhang, Xiaokun and Wang, Yaqing and Bhatia, Parminder and Xiao, Cao and Ma, Fenglong},
booktitle={Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
pages={3644--3656},
year={2024}
}