BiomedCLIPオープンソース生物医学モデル - クロスモーダル検索、画像分類などのタスクを無料でサポート

ホーム

Biomedclip PubMedBERT 256 Vit Base Patch16 224

microsoftによって開発

BiomedCLIPは、PMC-15Mデータセットで対照学習により事前学習された生物医学視覚言語基盤モデルで、クロスモーダル検索、画像分類、視覚的質問応答などのタスクをサポートします。

画像生成テキスト英語オープンソースライセンス:MIT #生物医学画像検索 #ゼロショット病理分類 #PubMedBERTテキストエンコーディング

ダウンロード数 137.39k

リリース時間 : 4/5/2023

モデル概要

このモデルはPubMedBERTをテキストエンコーダとして、視覚Transformerを画像エンコーダとして採用し、生物医学分野に特化して最適化されており、多様な生物医学画像タイプを処理できます。

モデル特徴

生物医学分野専用

生物医学分野に特化して最適化されており、顕微鏡、放射線画像、組織学などの多様な生物医学画像タイプを処理できます。

大規模事前学習

1500万枚の画像キャプションペアを含むPMC-15Mデータセットで事前学習されており、広範な生物医学画像タイプをカバーしています。

マルチタスクサポート

クロスモーダル検索、画像分類、視覚的質問応答など、さまざまな視覚言語処理タスクをサポートします。

モデル能力

生物医学画像分類

クロスモーダル検索

視覚的質問応答

ゼロショット学習

使用事例

医療画像分析

組織病理学画像分類

腺癌、扁平上皮癌などの異なる組織病理学画像タイプを識別

標準データセットで最先端の性能を更新

放射線画像分析

胸水などの放射線画像特徴を識別

医学研究

医学文献画像検索

テキスト記述に基づいて関連する医学画像を検索

🚀 BiomedCLIP-PubMedBERT_256-vit_base_patch16_224

BiomedCLIP は、コントラスト学習を用いて、PubMed Centralの生物医学研究論文から抽出された1500万の図とキャプションのペアのデータセット PMC-15M で事前学習された生物医学ビジョン言語基礎モデルです。このモデルは、PubMedBERTをテキストエンコーダーとして、Vision Transformerを画像エンコーダーとして使用し、ドメイン固有の適応を行っています。クロスモーダル検索、画像分類、ビジュアル質問応答など、さまざまなビジョン言語処理（VLP）タスクを実行できます。 BiomedCLIPは、幅広い標準データセットで新しい最先端技術を確立し、従来のVLPアプローチを大幅に上回っています。

🚀 クイックスタート

✨ 主な機能

BiomedCLIPは、生物医学分野のビジョン言語処理タスクに特化したモデルで、以下のような機能を持っています。

クロスモーダル検索：画像とテキスト間の相互検索が可能。
画像分類：生物医学画像の分類ができる。
ビジュアル質問応答：画像に関する質問に応答できる。

📦 インストール

環境構築

conda create -n biomedclip python=3.10 -y
conda activate biomedclip
pip install open_clip_torch==2.23.0 transformers==4.35.2 matplotlib

💻 使用例

基本的な使用法

1. Hugging Face Hubからのロード

import torch
from urllib.request import urlopen
from PIL import Image
from open_clip import create_model_from_pretrained, get_tokenizer

# Load the model and config files from the Hugging Face Hub
model, preprocess = create_model_from_pretrained('hf-hub:microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224')
tokenizer = get_tokenizer('hf-hub:microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224')


# Zero-shot image classification
template = 'this is a photo of '
labels = [
    'adenocarcinoma histopathology',
    'brain MRI',
    'covid line chart',
    'squamous cell carcinoma histopathology',
    'immunohistochemistry histopathology',
    'bone X-ray',
    'chest X-ray',
    'pie chart',
    'hematoxylin and eosin histopathology'
]

dataset_url = 'https://huggingface.co/microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224/resolve/main/example_data/biomed_image_classification_example_data/'
test_imgs = [
    'squamous_cell_carcinoma_histopathology.jpeg',
    'H_and_E_histopathology.jpg',
    'bone_X-ray.jpg',
    'adenocarcinoma_histopathology.jpg',
    'covid_line_chart.png',
    'IHC_histopathology.jpg',
    'chest_X-ray.jpg',
    'brain_MRI.jpg',
    'pie_chart.png'
]
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
model.to(device)
model.eval()

context_length = 256

images = torch.stack([preprocess(Image.open(urlopen(dataset_url + img))) for img in test_imgs]).to(device)
texts = tokenizer([template + l for l in labels], context_length=context_length).to(device)
with torch.no_grad():
    image_features, text_features, logit_scale = model(images, texts)

    logits = (logit_scale * image_features @ text_features.t()).detach().softmax(dim=-1)
    sorted_indices = torch.argsort(logits, dim=-1, descending=True)

    logits = logits.cpu().numpy()
    sorted_indices = sorted_indices.cpu().numpy()

top_k = -1

for i, img in enumerate(test_imgs):
    pred = labels[sorted_indices[i][0]]

    top_k = len(labels) if top_k == -1 else top_k
    print(img.split('/')[-1] + ':')
    for j in range(top_k):
        jth_index = sorted_indices[i][j]
        print(f'{labels[jth_index]}: {logits[i][jth_index]}')
    print('\n')

2. ローカルファイルからのロード

import json

from urllib.request import urlopen
from PIL import Image
import torch
from huggingface_hub import hf_hub_download
from open_clip import create_model_and_transforms, get_tokenizer
from open_clip.factory import HF_HUB_PREFIX, _MODEL_CONFIGS


# Download the model and config files
hf_hub_download(
    repo_id="microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224",
    filename="open_clip_pytorch_model.bin",
    local_dir="checkpoints"
)
hf_hub_download(
    repo_id="microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224",
    filename="open_clip_config.json",
    local_dir="checkpoints"
)


# Load the model and config files
model_name = "biomedclip_local"

with open("checkpoints/open_clip_config.json", "r") as f:
    config = json.load(f)
    model_cfg = config["model_cfg"]
    preprocess_cfg = config["preprocess_cfg"]


if (not model_name.startswith(HF_HUB_PREFIX)
    and model_name not in _MODEL_CONFIGS
    and config is not None):
    _MODEL_CONFIGS[model_name] = model_cfg

tokenizer = get_tokenizer(model_name)

model, _, preprocess = create_model_and_transforms(
    model_name=model_name,
    pretrained="checkpoints/open_clip_pytorch_model.bin",
    **{f"image_{k}": v for k, v in preprocess_cfg.items()},
)


# Zero-shot image classification
template = 'this is a photo of '
labels = [
    'adenocarcinoma histopathology',
    'brain MRI',
    'covid line chart',
    'squamous cell carcinoma histopathology',
    'immunohistochemistry histopathology',
    'bone X-ray',
    'chest X-ray',
    'pie chart',
    'hematoxylin and eosin histopathology'
]

dataset_url = 'https://huggingface.co/microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224/resolve/main/example_data/biomed_image_classification_example_data/'
test_imgs = [
    'squamous_cell_carcinoma_histopathology.jpeg',
    'H_and_E_histopathology.jpg',
    'bone_X-ray.jpg',
    'adenocarcinoma_histopathology.jpg',
    'covid_line_chart.png',
    'IHC_histopathology.jpg',
    'chest_X-ray.jpg',
    'brain_MRI.jpg',
    'pie_chart.png'
]
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
model.to(device)
model.eval()

context_length = 256

images = torch.stack([preprocess(Image.open(urlopen(dataset_url + img))) for img in test_imgs]).to(device)
texts = tokenizer([template + l for l in labels], context_length=context_length).to(device)
with torch.no_grad():
    image_features, text_features, logit_scale = model(images, texts)

    logits = (logit_scale * image_features @ text_features.t()).detach().softmax(dim=-1)
    sorted_indices = torch.argsort(logits, dim=-1, descending=True)

    logits = logits.cpu().numpy()
    sorted_indices = sorted_indices.cpu().numpy()

top_k = -1

for i, img in enumerate(test_imgs):
    pred = labels[sorted_indices[i][0]]

    top_k = len(labels) if top_k == -1 else top_k
    print(img.split('/')[-1] + ':')
    for j in range(top_k):
        jth_index = sorted_indices[i][j]
        print(f'{labels[jth_index]}: {logits[i][jth_index]}')
    print('\n')

Jupyter Notebookでの使用

詳細はこのサンプルノートブックを参照してください。

想定される使用方法

このモデルは、(I) ビジョン言語処理に関する将来の研究と (II) 参考文献に報告されている実験結果の再現性のみを目的として使用されることを想定しています。

主な想定使用方法

主な想定使用方法は、この研究を基にしたAI研究者を支援することです。BiomedCLIPとそれに関連するモデルは、特に放射線学の分野で、さまざまな生物医学VLP研究の質問を探求するのに役立つはずです。

想定外の使用方法

現在、モデルのすべての展開された使用事例（商用またはその他）は想定外です。広範な公開されている研究ベンチマークを使用してモデルを評価しましたが、モデルと評価は展開された使用事例を想定していません。詳細は関連する論文を参照してください。

📚 ドキュメント

学習データ

BiomedCLIP Data Pipelineを https://github.com/microsoft/BiomedCLIP_data_pipeline で公開しています。これは、PubMed CentralのOpen Accessデータセットから一連の記事を自動的にダウンロードして処理します。 BiomedCLIPは、このデータパイプラインによって生成された大規模な並列画像テキストデータセットであるPMC-15Mに基づいて構築されています。これは、PubMed Centralの生物医学研究論文から抽出された1500万の図とキャプションのペアを含み、顕微鏡画像、放射線画像、組織学画像など、さまざまな生物医学画像タイプをカバーしています。

参考文献

@article{zhang2024biomedclip,
  title={A Multimodal Biomedical Foundation Model Trained from Fifteen Million Image–Text Pairs},
  author={Sheng Zhang and Yanbo Xu and Naoto Usuyama and Hanwen Xu and Jaspreet Bagga and Robert Tinn and Sam Preston and Rajesh Rao and Mu Wei and Naveen Valluri and Cliff Wong and Andrea Tupini and Yu Wang and Matt Mazzola and Swadheen Shukla and Lars Liden and Jianfeng Gao and Angela Crabtree and Brian Piening and Carlo Bifulco and Matthew P. Lungren and Tristan Naumann and Sheng Wang and Hoifung Poon},
  journal={NEJM AI},
  year={2024},
  volume={2},
  number={1},
  doi={10.1056/AIoa2400640},
  url={https://ai.nejm.org/doi/full/10.1056/AIoa2400640}
}