Medcsp Clip
模型简介
该模型是基于OpenAI CLIP架构的变体,专门针对医学图像分类任务进行了优化。它能够实现零样本图像分类,即无需特定任务的训练即可对新类别进行分类。
模型特点
医学领域优化
针对医学图像特点进行了专门优化,适合处理医学影像数据
零样本学习
无需特定任务的训练即可对新类别进行分类
多模态理解
能够同时理解图像和文本信息,建立视觉-语言关联
模型能力
医学图像分类
跨模态检索
零样本学习
使用案例
医疗影像分析
医学影像分类
对X光、CT等医学影像进行分类识别
病理图像分析
识别病理切片中的异常组织
医学研究
医学图像检索
根据文本描述检索相关医学图像
🚀 MedCSP_clip模型卡
MedCSP_clip是一个用于零样本图像分类的模型,基于CLIP技术,可实现图像和文本的编码,在医学图像分析等领域有重要应用价值。
🚀 快速开始
这里展示了如何使用CLIP进行编码的示例:
from open_clip import create_model_from_pretrained, get_tokenizer
import torch
from urllib.request import urlopen
from PIL import Image
# import model, processor and tokenizer
model, processor = create_model_from_pretrained('hf-hub:xcwangpsu/MedCSP_clip')
tokenizer = get_tokenizer('hf-hub:xcwangpsu/MedCSP_clip')
# encode image:
# import raw radiological image:
image = Image.open(urlopen("https://huggingface.co/xcwangpsu/MedCSP_clip/resolve/main/image_sample.jpg"))
# preprocess the image, the final tensor should have 4 dimensions (B, C, H, W)
processed_image = processor(image)
processed_image = torch.unsqueeze(processed_image, 0)
print("Input size:", processed_image.shape)
# encode to a single embedding
image_embedding = model.encode_image(processed_image)
print("Individual image embedding size:",image_embedding.shape)
# sequential encoding
seq_image_embedding = model.visual.trunk.forward_features(processed_image)
print("Sequential image embedding size:",seq_image_embedding.shape)
# encode text:
text = "Chest X-ray reveals increased lung opacity, indicating potential fluid buildup or infection."
tokens = tokenizer(text)
# encode to a single embedding
text_embedding = model.encode_text(tokens)
print("Individual text embedding size:",text_embedding.shape)
# sequential encoding
seq_text_embedding = model.text.transformer(tokens, output_hidden_states=True).hidden_states[-1]
print("Sequential text embedding size:", seq_text_embedding.shape)
💻 使用示例
基础用法
from open_clip import create_model_from_pretrained, get_tokenizer
import torch
from urllib.request import urlopen
from PIL import Image
# import model, processor and tokenizer
model, processor = create_model_from_pretrained('hf-hub:xcwangpsu/MedCSP_clip')
tokenizer = get_tokenizer('hf-hub:xcwangpsu/MedCSP_clip')
# encode image:
# import raw radiological image:
image = Image.open(urlopen("https://huggingface.co/xcwangpsu/MedCSP_clip/resolve/main/image_sample.jpg"))
# preprocess the image, the final tensor should have 4 dimensions (B, C, H, W)
processed_image = processor(image)
processed_image = torch.unsqueeze(processed_image, 0)
print("Input size:", processed_image.shape)
# encode to a single embedding
image_embedding = model.encode_image(processed_image)
print("Individual image embedding size:",image_embedding.shape)
# sequential encoding
seq_image_embedding = model.visual.trunk.forward_features(processed_image)
print("Sequential image embedding size:",seq_image_embedding.shape)
# encode text:
text = "Chest X-ray reveals increased lung opacity, indicating potential fluid buildup or infection."
tokens = tokenizer(text)
# encode to a single embedding
text_embedding = model.encode_text(tokens)
print("Individual text embedding size:",text_embedding.shape)
# sequential encoding
seq_text_embedding = model.text.transformer(tokens, output_hidden_states=True).hidden_states[-1]
print("Sequential text embedding size:", seq_text_embedding.shape)
高级用法
# 此代码示例展示了如何对图像和文本进行编码,可根据实际需求在医学图像分析等场景中灵活运用。
from open_clip import create_model_from_pretrained, get_tokenizer
import torch
from urllib.request import urlopen
from PIL import Image
# import model, processor and tokenizer
model, processor = create_model_from_pretrained('hf-hub:xcwangpsu/MedCSP_clip')
tokenizer = get_tokenizer('hf-hub:xcwangpsu/MedCSP_clip')
# encode image:
# import raw radiological image:
image = Image.open(urlopen("https://huggingface.co/xcwangpsu/MedCSP_clip/resolve/main/image_sample.jpg"))
# preprocess the image, the final tensor should have 4 dimensions (B, C, H, W)
processed_image = processor(image)
processed_image = torch.unsqueeze(processed_image, 0)
print("Input size:", processed_image.shape)
# encode to a single embedding
image_embedding = model.encode_image(processed_image)
print("Individual image embedding size:",image_embedding.shape)
# sequential encoding
seq_image_embedding = model.visual.trunk.forward_features(processed_image)
print("Sequential image embedding size:",seq_image_embedding.shape)
# encode text:
text = "Chest X-ray reveals increased lung opacity, indicating potential fluid buildup or infection."
tokens = tokenizer(text)
# encode to a single embedding
text_embedding = model.encode_text(tokens)
print("Individual text embedding size:",text_embedding.shape)
# sequential encoding
seq_text_embedding = model.text.transformer(tokens, output_hidden_states=True).hidden_states[-1]
print("Sequential text embedding size:", seq_text_embedding.shape)
📄 许可证
本项目采用MIT许可证。
致谢
如果您发现本仓库或我们论文中提供的任何资源有用,请使用以下BibTex引用我们的论文:
@inproceedings{wang2024unity,
title={Unity in Diversity: Collaborative Pre-training Across Multimodal Medical Sources},
author={Wang, Xiaochen and Luo, Junyu and Wang, Jiaqi and Zhong, Yuan and Zhang, Xiaokun and Wang, Yaqing and Bhatia, Parminder and Xiao, Cao and Ma, Fenglong},
booktitle={Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
pages={3644--3656},
year={2024}
}
Clip Vit Large Patch14 336
基于Vision Transformer架构的大规模视觉语言预训练模型,支持图像与文本的跨模态理解
文本生成图像
Transformers

C
openai
5.9M
241
Fashion Clip
MIT
FashionCLIP是基于CLIP开发的视觉语言模型,专门针对时尚领域进行微调,能够生成通用产品表征。
文本生成图像
Transformers 英语

F
patrickjohncyh
3.8M
222
Gemma 3 1b It
Gemma 3是Google推出的轻量级先进开放模型系列,基于与Gemini模型相同的研究和技术构建。该模型是多模态模型,能够处理文本和图像输入并生成文本输出。
文本生成图像
Transformers

G
google
2.1M
347
Blip Vqa Base
Bsd-3-clause
BLIP是一个统一的视觉语言预训练框架,擅长视觉问答任务,通过语言-图像联合训练实现多模态理解与生成能力
文本生成图像
Transformers

B
Salesforce
1.9M
154
CLIP ViT H 14 Laion2b S32b B79k
MIT
基于OpenCLIP框架在LAION-2B英文数据集上训练的视觉-语言模型,支持零样本图像分类和跨模态检索任务
文本生成图像
Safetensors
C
laion
1.8M
368
CLIP ViT B 32 Laion2b S34b B79k
MIT
基于OpenCLIP框架在LAION-2B英语子集上训练的视觉-语言模型,支持零样本图像分类和跨模态检索
文本生成图像
Safetensors
C
laion
1.1M
112
Pickscore V1
PickScore v1 是一个针对文本生成图像的评分函数,可用于预测人类偏好、评估模型性能和图像排序等任务。
文本生成图像
Transformers

P
yuvalkirstain
1.1M
44
Owlv2 Base Patch16 Ensemble
Apache-2.0
OWLv2是一种零样本文本条件目标检测模型,可通过文本查询在图像中定位对象。
文本生成图像
Transformers

O
google
932.80k
99
Llama 3.2 11B Vision Instruct
Llama 3.2 是 Meta 发布的多语言多模态大型语言模型,支持图像文本到文本的转换任务,具备强大的跨模态理解能力。
文本生成图像
Transformers 支持多种语言

L
meta-llama
784.19k
1,424
Owlvit Base Patch32
Apache-2.0
OWL-ViT是一个零样本文本条件目标检测模型,可以通过文本查询搜索图像中的对象,无需特定类别的训练数据。
文本生成图像
Transformers

O
google
764.95k
129
精选推荐AI模型
Llama 3 Typhoon V1.5x 8b Instruct
专为泰语设计的80亿参数指令模型,性能媲美GPT-3.5-turbo,优化了应用场景、检索增强生成、受限生成和推理任务
大型语言模型
Transformers 支持多种语言

L
scb10x
3,269
16
Cadet Tiny
Openrail
Cadet-Tiny是一个基于SODA数据集训练的超小型对话模型,专为边缘设备推理设计,体积仅为Cosmo-3B模型的2%左右。
对话系统
Transformers 英语

C
ToddGoldfarb
2,691
6
Roberta Base Chinese Extractive Qa
基于RoBERTa架构的中文抽取式问答模型,适用于从给定文本中提取答案的任务。
问答系统 中文
R
uer
2,694
98