Medcsp Clip
模型概述
該模型是基於OpenAI CLIP架構的變體,專門針對醫學圖像分類任務進行了優化。它能夠實現零樣本圖像分類,即無需特定任務的訓練即可對新類別進行分類。
模型特點
醫學領域優化
針對醫學圖像特點進行了專門優化,適合處理醫學影像數據
零樣本學習
無需特定任務的訓練即可對新類別進行分類
多模態理解
能夠同時理解圖像和文本信息,建立視覺-語言關聯
模型能力
醫學圖像分類
跨模態檢索
零樣本學習
使用案例
醫療影像分析
醫學影像分類
對X光、CT等醫學影像進行分類識別
病理圖像分析
識別病理切片中的異常組織
醫學研究
醫學圖像檢索
根據文本描述檢索相關醫學圖像
🚀 MedCSP_clip模型卡
MedCSP_clip是一個用於零樣本圖像分類的模型,基於CLIP技術,可實現圖像和文本的編碼,在醫學圖像分析等領域有重要應用價值。
🚀 快速開始
這裡展示瞭如何使用CLIP進行編碼的示例:
from open_clip import create_model_from_pretrained, get_tokenizer
import torch
from urllib.request import urlopen
from PIL import Image
# import model, processor and tokenizer
model, processor = create_model_from_pretrained('hf-hub:xcwangpsu/MedCSP_clip')
tokenizer = get_tokenizer('hf-hub:xcwangpsu/MedCSP_clip')
# encode image:
# import raw radiological image:
image = Image.open(urlopen("https://huggingface.co/xcwangpsu/MedCSP_clip/resolve/main/image_sample.jpg"))
# preprocess the image, the final tensor should have 4 dimensions (B, C, H, W)
processed_image = processor(image)
processed_image = torch.unsqueeze(processed_image, 0)
print("Input size:", processed_image.shape)
# encode to a single embedding
image_embedding = model.encode_image(processed_image)
print("Individual image embedding size:",image_embedding.shape)
# sequential encoding
seq_image_embedding = model.visual.trunk.forward_features(processed_image)
print("Sequential image embedding size:",seq_image_embedding.shape)
# encode text:
text = "Chest X-ray reveals increased lung opacity, indicating potential fluid buildup or infection."
tokens = tokenizer(text)
# encode to a single embedding
text_embedding = model.encode_text(tokens)
print("Individual text embedding size:",text_embedding.shape)
# sequential encoding
seq_text_embedding = model.text.transformer(tokens, output_hidden_states=True).hidden_states[-1]
print("Sequential text embedding size:", seq_text_embedding.shape)
💻 使用示例
基礎用法
from open_clip import create_model_from_pretrained, get_tokenizer
import torch
from urllib.request import urlopen
from PIL import Image
# import model, processor and tokenizer
model, processor = create_model_from_pretrained('hf-hub:xcwangpsu/MedCSP_clip')
tokenizer = get_tokenizer('hf-hub:xcwangpsu/MedCSP_clip')
# encode image:
# import raw radiological image:
image = Image.open(urlopen("https://huggingface.co/xcwangpsu/MedCSP_clip/resolve/main/image_sample.jpg"))
# preprocess the image, the final tensor should have 4 dimensions (B, C, H, W)
processed_image = processor(image)
processed_image = torch.unsqueeze(processed_image, 0)
print("Input size:", processed_image.shape)
# encode to a single embedding
image_embedding = model.encode_image(processed_image)
print("Individual image embedding size:",image_embedding.shape)
# sequential encoding
seq_image_embedding = model.visual.trunk.forward_features(processed_image)
print("Sequential image embedding size:",seq_image_embedding.shape)
# encode text:
text = "Chest X-ray reveals increased lung opacity, indicating potential fluid buildup or infection."
tokens = tokenizer(text)
# encode to a single embedding
text_embedding = model.encode_text(tokens)
print("Individual text embedding size:",text_embedding.shape)
# sequential encoding
seq_text_embedding = model.text.transformer(tokens, output_hidden_states=True).hidden_states[-1]
print("Sequential text embedding size:", seq_text_embedding.shape)
高級用法
# 此代碼示例展示瞭如何對圖像和文本進行編碼,可根據實際需求在醫學圖像分析等場景中靈活運用。
from open_clip import create_model_from_pretrained, get_tokenizer
import torch
from urllib.request import urlopen
from PIL import Image
# import model, processor and tokenizer
model, processor = create_model_from_pretrained('hf-hub:xcwangpsu/MedCSP_clip')
tokenizer = get_tokenizer('hf-hub:xcwangpsu/MedCSP_clip')
# encode image:
# import raw radiological image:
image = Image.open(urlopen("https://huggingface.co/xcwangpsu/MedCSP_clip/resolve/main/image_sample.jpg"))
# preprocess the image, the final tensor should have 4 dimensions (B, C, H, W)
processed_image = processor(image)
processed_image = torch.unsqueeze(processed_image, 0)
print("Input size:", processed_image.shape)
# encode to a single embedding
image_embedding = model.encode_image(processed_image)
print("Individual image embedding size:",image_embedding.shape)
# sequential encoding
seq_image_embedding = model.visual.trunk.forward_features(processed_image)
print("Sequential image embedding size:",seq_image_embedding.shape)
# encode text:
text = "Chest X-ray reveals increased lung opacity, indicating potential fluid buildup or infection."
tokens = tokenizer(text)
# encode to a single embedding
text_embedding = model.encode_text(tokens)
print("Individual text embedding size:",text_embedding.shape)
# sequential encoding
seq_text_embedding = model.text.transformer(tokens, output_hidden_states=True).hidden_states[-1]
print("Sequential text embedding size:", seq_text_embedding.shape)
📄 許可證
本項目採用MIT許可證。
致謝
如果您發現本倉庫或我們論文中提供的任何資源有用,請使用以下BibTex引用我們的論文:
@inproceedings{wang2024unity,
title={Unity in Diversity: Collaborative Pre-training Across Multimodal Medical Sources},
author={Wang, Xiaochen and Luo, Junyu and Wang, Jiaqi and Zhong, Yuan and Zhang, Xiaokun and Wang, Yaqing and Bhatia, Parminder and Xiao, Cao and Ma, Fenglong},
booktitle={Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
pages={3644--3656},
year={2024}
}
Clip Vit Large Patch14 336
基於Vision Transformer架構的大規模視覺語言預訓練模型,支持圖像與文本的跨模態理解
文本生成圖像
Transformers

C
openai
5.9M
241
Fashion Clip
MIT
FashionCLIP是基於CLIP開發的視覺語言模型,專門針對時尚領域進行微調,能夠生成通用產品表徵。
文本生成圖像
Transformers 英語

F
patrickjohncyh
3.8M
222
Gemma 3 1b It
Gemma 3是Google推出的輕量級先進開放模型系列,基於與Gemini模型相同的研究和技術構建。該模型是多模態模型,能夠處理文本和圖像輸入並生成文本輸出。
文本生成圖像
Transformers

G
google
2.1M
347
Blip Vqa Base
Bsd-3-clause
BLIP是一個統一的視覺語言預訓練框架,擅長視覺問答任務,通過語言-圖像聯合訓練實現多模態理解與生成能力
文本生成圖像
Transformers

B
Salesforce
1.9M
154
CLIP ViT H 14 Laion2b S32b B79k
MIT
基於OpenCLIP框架在LAION-2B英文數據集上訓練的視覺-語言模型,支持零樣本圖像分類和跨模態檢索任務
文本生成圖像
Safetensors
C
laion
1.8M
368
CLIP ViT B 32 Laion2b S34b B79k
MIT
基於OpenCLIP框架在LAION-2B英語子集上訓練的視覺-語言模型,支持零樣本圖像分類和跨模態檢索
文本生成圖像
Safetensors
C
laion
1.1M
112
Pickscore V1
PickScore v1 是一個針對文本生成圖像的評分函數,可用於預測人類偏好、評估模型性能和圖像排序等任務。
文本生成圖像
Transformers

P
yuvalkirstain
1.1M
44
Owlv2 Base Patch16 Ensemble
Apache-2.0
OWLv2是一種零樣本文本條件目標檢測模型,可通過文本查詢在圖像中定位對象。
文本生成圖像
Transformers

O
google
932.80k
99
Llama 3.2 11B Vision Instruct
Llama 3.2 是 Meta 發佈的多語言多模態大型語言模型,支持圖像文本到文本的轉換任務,具備強大的跨模態理解能力。
文本生成圖像
Transformers 支持多種語言

L
meta-llama
784.19k
1,424
Owlvit Base Patch32
Apache-2.0
OWL-ViT是一個零樣本文本條件目標檢測模型,可以通過文本查詢搜索圖像中的對象,無需特定類別的訓練數據。
文本生成圖像
Transformers

O
google
764.95k
129
精選推薦AI模型
Llama 3 Typhoon V1.5x 8b Instruct
專為泰語設計的80億參數指令模型,性能媲美GPT-3.5-turbo,優化了應用場景、檢索增強生成、受限生成和推理任務
大型語言模型
Transformers 支持多種語言

L
scb10x
3,269
16
Cadet Tiny
Openrail
Cadet-Tiny是一個基於SODA數據集訓練的超小型對話模型,專為邊緣設備推理設計,體積僅為Cosmo-3B模型的2%左右。
對話系統
Transformers 英語

C
ToddGoldfarb
2,691
6
Roberta Base Chinese Extractive Qa
基於RoBERTa架構的中文抽取式問答模型,適用於從給定文本中提取答案的任務。
問答系統 中文
R
uer
2,694
98