BiomedCLIP開源生物醫學模型 - 免費支持跨模態檢索、圖像分類等任務

首頁

Biomedclip PubMedBERT 256 Vit Base Patch16 224

由microsoft開發

BiomedCLIP是一個生物醫學視覺語言基礎模型，通過對比學習在PMC-15M數據集上進行預訓練，支持跨模態檢索、圖像分類和視覺問答等任務。

圖像生成文本英語開源協議:MIT #生物醫學圖文檢索 #零樣本病理分類 #PubMedBERT文本編碼

下載量 137.39k

發布時間 : 4/5/2023

模型概述

該模型採用PubMedBERT作為文本編碼器，視覺Transformer作為圖像編碼器，專門針對生物醫學領域進行優化，能夠處理多樣化的生物醫學圖像類型。

模型特點

生物醫學領域專用

專門針對生物醫學領域進行優化，能夠處理顯微鏡、放射影像、組織學等多樣化的生物醫學圖像類型。

大規模預訓練

在包含1500萬張圖注配對的PMC-15M數據集上進行預訓練，涵蓋廣泛的生物醫學圖像類型。

多任務支持

支持跨模態檢索、圖像分類和視覺問答等多種視覺語言處理任務。

模型能力

生物醫學圖像分類

跨模態檢索

視覺問答

零樣本學習

使用案例

醫學影像分析

組織病理學圖像分類

識別腺癌、鱗狀細胞癌等不同組織病理學圖像類型

在標準數據集上刷新了最先進水平

放射影像分析

識別胸腔積液等放射影像特徵

醫學研究

醫學文獻圖像檢索

根據文本描述檢索相關醫學圖像

🚀 BiomedCLIP-PubMedBERT_256-vit_base_patch16_224

BiomedCLIP 是一個生物醫學視覺語言基礎模型，它使用對比學習方法，在 PMC-15M 數據集上進行預訓練。PMC-15M 是一個從 PubMed Central 生物醫學研究文章中提取的包含 1500 萬對圖像 - 文本對的數據集。該模型使用 PubMedBERT 作為文本編碼器，使用視覺變換器（Vision Transformer）作為圖像編碼器，並進行了特定領域的調整。它可以執行各種視覺語言處理（VLP）任務，如跨模態檢索、圖像分類和視覺問答。BiomedCLIP 在廣泛的標準數據集上建立了新的技術水平，並顯著優於之前的 VLP 方法：

🚀 快速開始

本模型可用於執行零樣本圖像分類等視覺語言處理任務，以下將詳細介紹其使用方法。

✨ 主要特性

多模態處理：結合了生物醫學圖像和文本信息，能夠執行跨模態檢索、圖像分類和視覺問答等多種視覺語言處理任務。
領域適配：使用 PubMedBERT 作為文本編碼器，Vision Transformer 作為圖像編碼器，並進行了特定領域的調整，適用於生物醫學領域。
高性能表現：在廣泛的標準數據集上建立了新的技術水平，顯著優於之前的 VLP 方法。

📦 安裝指南

環境準備

conda create -n biomedclip python=3.10 -y
conda activate biomedclip
pip install open_clip_torch==2.23.0 transformers==4.35.2 matplotlib

💻 使用示例

基礎用法

從 Hugging Face Hub 加載模型

import torch
from urllib.request import urlopen
from PIL import Image
from open_clip import create_model_from_pretrained, get_tokenizer

# 從 Hugging Face Hub 加載模型和配置文件
model, preprocess = create_model_from_pretrained('hf-hub:microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224')
tokenizer = get_tokenizer('hf-hub:microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224')


# 零樣本圖像分類
template = 'this is a photo of '
labels = [
    'adenocarcinoma histopathology',
    'brain MRI',
    'covid line chart',
    'squamous cell carcinoma histopathology',
    'immunohistochemistry histopathology',
    'bone X-ray',
    'chest X-ray',
    'pie chart',
    'hematoxylin and eosin histopathology'
]

dataset_url = 'https://huggingface.co/microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224/resolve/main/example_data/biomed_image_classification_example_data/'
test_imgs = [
    'squamous_cell_carcinoma_histopathology.jpeg',
    'H_and_E_histopathology.jpg',
    'bone_X-ray.jpg',
    'adenocarcinoma_histopathology.jpg',
    'covid_line_chart.png',
    'IHC_histopathology.jpg',
    'chest_X-ray.jpg',
    'brain_MRI.jpg',
    'pie_chart.png'
]
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
model.to(device)
model.eval()

context_length = 256

images = torch.stack([preprocess(Image.open(urlopen(dataset_url + img))) for img in test_imgs]).to(device)
texts = tokenizer([template + l for l in labels], context_length=context_length).to(device)
with torch.no_grad():
    image_features, text_features, logit_scale = model(images, texts)

    logits = (logit_scale * image_features @ text_features.t()).detach().softmax(dim=-1)
    sorted_indices = torch.argsort(logits, dim=-1, descending=True)

    logits = logits.cpu().numpy()
    sorted_indices = sorted_indices.cpu().numpy()

top_k = -1

for i, img in enumerate(test_imgs):
    pred = labels[sorted_indices[i][0]]

    top_k = len(labels) if top_k == -1 else top_k
    print(img.split('/')[-1] + ':')
    for j in range(top_k):
        jth_index = sorted_indices[i][j]
        print(f'{labels[jth_index]}: {logits[i][jth_index]}')
    print('\n')

從本地文件加載模型

import json

from urllib.request import urlopen
from PIL import Image
import torch
from huggingface_hub import hf_hub_download
from open_clip import create_model_and_transforms, get_tokenizer
from open_clip.factory import HF_HUB_PREFIX, _MODEL_CONFIGS


# 下載模型和配置文件
hf_hub_download(
    repo_id="microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224",
    filename="open_clip_pytorch_model.bin",
    local_dir="checkpoints"
)
hf_hub_download(
    repo_id="microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224",
    filename="open_clip_config.json",
    local_dir="checkpoints"
)


# 加載模型和配置文件
model_name = "biomedclip_local"

with open("checkpoints/open_clip_config.json", "r") as f:
    config = json.load(f)
    model_cfg = config["model_cfg"]
    preprocess_cfg = config["preprocess_cfg"]


if (not model_name.startswith(HF_HUB_PREFIX)
    and model_name not in _MODEL_CONFIGS
    and config is not None):
    _MODEL_CONFIGS[model_name] = model_cfg

tokenizer = get_tokenizer(model_name)

model, _, preprocess = create_model_and_transforms(
    model_name=model_name,
    pretrained="checkpoints/open_clip_pytorch_model.bin",
    **{f"image_{k}": v for k, v in preprocess_cfg.items()},
)


# 零樣本圖像分類
template = 'this is a photo of '
labels = [
    'adenocarcinoma histopathology',
    'brain MRI',
    'covid line chart',
    'squamous cell carcinoma histopathology',
    'immunohistochemistry histopathology',
    'bone X-ray',
    'chest X-ray',
    'pie chart',
    'hematoxylin and eosin histopathology'
]

dataset_url = 'https://huggingface.co/microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224/resolve/main/example_data/biomed_image_classification_example_data/'
test_imgs = [
    'squamous_cell_carcinoma_histopathology.jpeg',
    'H_and_E_histopathology.jpg',
    'bone_X-ray.jpg',
    'adenocarcinoma_histopathology.jpg',
    'covid_line_chart.png',
    'IHC_histopathology.jpg',
    'chest_X-ray.jpg',
    'brain_MRI.jpg',
    'pie_chart.png'
]
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
model.to(device)
model.eval()

context_length = 256

images = torch.stack([preprocess(Image.open(urlopen(dataset_url + img))) for img in test_imgs]).to(device)
texts = tokenizer([template + l for l in labels], context_length=context_length).to(device)
with torch.no_grad():
    image_features, text_features, logit_scale = model(images, texts)

    logits = (logit_scale * image_features @ text_features.t()).detach().softmax(dim=-1)
    sorted_indices = torch.argsort(logits, dim=-1, descending=True)

    logits = logits.cpu().numpy()
    sorted_indices = sorted_indices.cpu().numpy()

top_k = -1

for i, img in enumerate(test_imgs):
    pred = labels[sorted_indices[i][0]]

    top_k = len(labels) if top_k == -1 else top_k
    print(img.split('/')[-1] + ':')
    for j in range(top_k):
        jth_index = sorted_indices[i][j]
        print(f'{labels[jth_index]}: {logits[i][jth_index]}')
    print('\n')

高級用法

在 Jupyter Notebook 中使用

請參考這個示例筆記本。

預期用途

本模型僅用於（I）未來視覺語言處理研究和（II）復現參考論文中報告的實驗結果。

主要預期用途

主要預期用途是支持在此基礎上開展工作的人工智能研究人員。BiomedCLIP 及其相關模型有助於探索各種生物醫學視覺語言處理研究問題，特別是在放射學領域。

非預期用途

目前，該模型的任何部署用例（商業或其他用途）均不在預期範圍內。儘管我們使用了廣泛的公開研究基準對模型進行了評估，但這些模型和評估並非用於部署用例。更多詳細信息請參考相關論文。

📚 詳細文檔

訓練數據

我們已在 https://github.com/microsoft/BiomedCLIP_data_pipeline 發佈了 BiomedCLIP 數據管道，該管道可自動下載並處理來自 PubMed Central 開放獲取數據集的一組文章。 BiomedCLIP 基於 PMC-15M 數據集構建，這是一個由該數據管道為生物醫學視覺語言處理生成的大規模並行圖像 - 文本數據集。它包含從 PubMed Central 生物醫學研究文章中提取的 1500 萬對圖像 - 文本對，涵蓋了各種生物醫學圖像類型，如顯微鏡圖像、放射圖像、組織學圖像等。

參考資料

@article{zhang2024biomedclip,
  title={A Multimodal Biomedical Foundation Model Trained from Fifteen Million Image–Text Pairs},
  author={Sheng Zhang and Yanbo Xu and Naoto Usuyama and Hanwen Xu and Jaspreet Bagga and Robert Tinn and Sam Preston and Rajesh Rao and Mu Wei and Naveen Valluri and Cliff Wong and Andrea Tupini and Yu Wang and Matt Mazzola and Swadheen Shukla and Lars Liden and Jianfeng Gao and Angela Crabtree and Brian Piening and Carlo Bifulco and Matthew P. Lungren and Tristan Naumann and Sheng Wang and Hoifung Poon},
  journal={NEJM AI},
  year={2024},
  volume={2},
  number={1},
  doi={10.1056/AIoa2400640},
  url={https://ai.nejm.org/doi/full/10.1056/AIoa2400640}
}