BiomedCLIP开源生物医学模型 - 免费支持跨模态检索、图像分类等任务

首页

Biomedclip PubMedBERT 256 Vit Base Patch16 224

由 microsoft 开发

BiomedCLIP是一个生物医学视觉语言基础模型，通过对比学习在PMC-15M数据集上进行预训练，支持跨模态检索、图像分类和视觉问答等任务。

图像生成文本英语开源协议:MIT #生物医学图文检索 #零样本病理分类 #PubMedBERT文本编码

下载量 137.39k

发布时间 : 4/5/2023

模型简介

该模型采用PubMedBERT作为文本编码器，视觉Transformer作为图像编码器，专门针对生物医学领域进行优化，能够处理多样化的生物医学图像类型。

模型特点

生物医学领域专用

专门针对生物医学领域进行优化，能够处理显微镜、放射影像、组织学等多样化的生物医学图像类型。

大规模预训练

在包含1500万张图注配对的PMC-15M数据集上进行预训练，涵盖广泛的生物医学图像类型。

多任务支持

支持跨模态检索、图像分类和视觉问答等多种视觉语言处理任务。

模型能力

生物医学图像分类

跨模态检索

视觉问答

零样本学习

使用案例

医学影像分析

组织病理学图像分类

识别腺癌、鳞状细胞癌等不同组织病理学图像类型

在标准数据集上刷新了最先进水平

放射影像分析

识别胸腔积液等放射影像特征

医学研究

医学文献图像检索

根据文本描述检索相关医学图像

🚀 BiomedCLIP-PubMedBERT_256-vit_base_patch16_224

BiomedCLIP 是一个生物医学视觉语言基础模型，它使用对比学习方法，在 PMC-15M 数据集上进行预训练。PMC-15M 是一个从 PubMed Central 生物医学研究文章中提取的包含 1500 万对图像 - 文本对的数据集。该模型使用 PubMedBERT 作为文本编码器，使用视觉变换器（Vision Transformer）作为图像编码器，并进行了特定领域的调整。它可以执行各种视觉语言处理（VLP）任务，如跨模态检索、图像分类和视觉问答。BiomedCLIP 在广泛的标准数据集上建立了新的技术水平，并显著优于之前的 VLP 方法：

🚀 快速开始

本模型可用于执行零样本图像分类等视觉语言处理任务，以下将详细介绍其使用方法。

✨ 主要特性

多模态处理：结合了生物医学图像和文本信息，能够执行跨模态检索、图像分类和视觉问答等多种视觉语言处理任务。
领域适配：使用 PubMedBERT 作为文本编码器，Vision Transformer 作为图像编码器，并进行了特定领域的调整，适用于生物医学领域。
高性能表现：在广泛的标准数据集上建立了新的技术水平，显著优于之前的 VLP 方法。

📦 安装指南

环境准备

conda create -n biomedclip python=3.10 -y
conda activate biomedclip
pip install open_clip_torch==2.23.0 transformers==4.35.2 matplotlib

💻 使用示例

基础用法

从 Hugging Face Hub 加载模型

import torch
from urllib.request import urlopen
from PIL import Image
from open_clip import create_model_from_pretrained, get_tokenizer

# 从 Hugging Face Hub 加载模型和配置文件
model, preprocess = create_model_from_pretrained('hf-hub:microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224')
tokenizer = get_tokenizer('hf-hub:microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224')


# 零样本图像分类
template = 'this is a photo of '
labels = [
    'adenocarcinoma histopathology',
    'brain MRI',
    'covid line chart',
    'squamous cell carcinoma histopathology',
    'immunohistochemistry histopathology',
    'bone X-ray',
    'chest X-ray',
    'pie chart',
    'hematoxylin and eosin histopathology'
]

dataset_url = 'https://huggingface.co/microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224/resolve/main/example_data/biomed_image_classification_example_data/'
test_imgs = [
    'squamous_cell_carcinoma_histopathology.jpeg',
    'H_and_E_histopathology.jpg',
    'bone_X-ray.jpg',
    'adenocarcinoma_histopathology.jpg',
    'covid_line_chart.png',
    'IHC_histopathology.jpg',
    'chest_X-ray.jpg',
    'brain_MRI.jpg',
    'pie_chart.png'
]
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
model.to(device)
model.eval()

context_length = 256

images = torch.stack([preprocess(Image.open(urlopen(dataset_url + img))) for img in test_imgs]).to(device)
texts = tokenizer([template + l for l in labels], context_length=context_length).to(device)
with torch.no_grad():
    image_features, text_features, logit_scale = model(images, texts)

    logits = (logit_scale * image_features @ text_features.t()).detach().softmax(dim=-1)
    sorted_indices = torch.argsort(logits, dim=-1, descending=True)

    logits = logits.cpu().numpy()
    sorted_indices = sorted_indices.cpu().numpy()

top_k = -1

for i, img in enumerate(test_imgs):
    pred = labels[sorted_indices[i][0]]

    top_k = len(labels) if top_k == -1 else top_k
    print(img.split('/')[-1] + ':')
    for j in range(top_k):
        jth_index = sorted_indices[i][j]
        print(f'{labels[jth_index]}: {logits[i][jth_index]}')
    print('\n')

从本地文件加载模型

import json

from urllib.request import urlopen
from PIL import Image
import torch
from huggingface_hub import hf_hub_download
from open_clip import create_model_and_transforms, get_tokenizer
from open_clip.factory import HF_HUB_PREFIX, _MODEL_CONFIGS


# 下载模型和配置文件
hf_hub_download(
    repo_id="microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224",
    filename="open_clip_pytorch_model.bin",
    local_dir="checkpoints"
)
hf_hub_download(
    repo_id="microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224",
    filename="open_clip_config.json",
    local_dir="checkpoints"
)


# 加载模型和配置文件
model_name = "biomedclip_local"

with open("checkpoints/open_clip_config.json", "r") as f:
    config = json.load(f)
    model_cfg = config["model_cfg"]
    preprocess_cfg = config["preprocess_cfg"]


if (not model_name.startswith(HF_HUB_PREFIX)
    and model_name not in _MODEL_CONFIGS
    and config is not None):
    _MODEL_CONFIGS[model_name] = model_cfg

tokenizer = get_tokenizer(model_name)

model, _, preprocess = create_model_and_transforms(
    model_name=model_name,
    pretrained="checkpoints/open_clip_pytorch_model.bin",
    **{f"image_{k}": v for k, v in preprocess_cfg.items()},
)


# 零样本图像分类
template = 'this is a photo of '
labels = [
    'adenocarcinoma histopathology',
    'brain MRI',
    'covid line chart',
    'squamous cell carcinoma histopathology',
    'immunohistochemistry histopathology',
    'bone X-ray',
    'chest X-ray',
    'pie chart',
    'hematoxylin and eosin histopathology'
]

dataset_url = 'https://huggingface.co/microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224/resolve/main/example_data/biomed_image_classification_example_data/'
test_imgs = [
    'squamous_cell_carcinoma_histopathology.jpeg',
    'H_and_E_histopathology.jpg',
    'bone_X-ray.jpg',
    'adenocarcinoma_histopathology.jpg',
    'covid_line_chart.png',
    'IHC_histopathology.jpg',
    'chest_X-ray.jpg',
    'brain_MRI.jpg',
    'pie_chart.png'
]
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
model.to(device)
model.eval()

context_length = 256

images = torch.stack([preprocess(Image.open(urlopen(dataset_url + img))) for img in test_imgs]).to(device)
texts = tokenizer([template + l for l in labels], context_length=context_length).to(device)
with torch.no_grad():
    image_features, text_features, logit_scale = model(images, texts)

    logits = (logit_scale * image_features @ text_features.t()).detach().softmax(dim=-1)
    sorted_indices = torch.argsort(logits, dim=-1, descending=True)

    logits = logits.cpu().numpy()
    sorted_indices = sorted_indices.cpu().numpy()

top_k = -1

for i, img in enumerate(test_imgs):
    pred = labels[sorted_indices[i][0]]

    top_k = len(labels) if top_k == -1 else top_k
    print(img.split('/')[-1] + ':')
    for j in range(top_k):
        jth_index = sorted_indices[i][j]
        print(f'{labels[jth_index]}: {logits[i][jth_index]}')
    print('\n')

高级用法

在 Jupyter Notebook 中使用

请参考这个示例笔记本。

预期用途

本模型仅用于（I）未来视觉语言处理研究和（II）复现参考论文中报告的实验结果。

主要预期用途

主要预期用途是支持在此基础上开展工作的人工智能研究人员。BiomedCLIP 及其相关模型有助于探索各种生物医学视觉语言处理研究问题，特别是在放射学领域。

非预期用途

目前，该模型的任何部署用例（商业或其他用途）均不在预期范围内。尽管我们使用了广泛的公开研究基准对模型进行了评估，但这些模型和评估并非用于部署用例。更多详细信息请参考相关论文。

📚 详细文档

训练数据

我们已在 https://github.com/microsoft/BiomedCLIP_data_pipeline 发布了 BiomedCLIP 数据管道，该管道可自动下载并处理来自 PubMed Central 开放获取数据集的一组文章。 BiomedCLIP 基于 PMC-15M 数据集构建，这是一个由该数据管道为生物医学视觉语言处理生成的大规模并行图像 - 文本数据集。它包含从 PubMed Central 生物医学研究文章中提取的 1500 万对图像 - 文本对，涵盖了各种生物医学图像类型，如显微镜图像、放射图像、组织学图像等。

参考资料

@article{zhang2024biomedclip,
  title={A Multimodal Biomedical Foundation Model Trained from Fifteen Million Image–Text Pairs},
  author={Sheng Zhang and Yanbo Xu and Naoto Usuyama and Hanwen Xu and Jaspreet Bagga and Robert Tinn and Sam Preston and Rajesh Rao and Mu Wei and Naveen Valluri and Cliff Wong and Andrea Tupini and Yu Wang and Matt Mazzola and Swadheen Shukla and Lars Liden and Jianfeng Gao and Angela Crabtree and Brian Piening and Carlo Bifulco and Matthew P. Lungren and Tristan Naumann and Sheng Wang and Hoifung Poon},
  journal={NEJM AI},
  year={2024},
  volume={2},
  number={1},
  doi={10.1056/AIoa2400640},
  url={https://ai.nejm.org/doi/full/10.1056/AIoa2400640}
}