SegVol開源醫學體數據圖像分割模型 - 支持多方式提示進行體積分割

首頁

Segvol

由yuxindu開發

SegVol是一款通用且交互式的醫學體數據圖像分割模型，支持通過點提示、框提示和文本提示進行體積分割。

圖像分割

Transformers

英語開源協議:MIT #交互式醫學分割 #多模態提示 #3D體數據

下載量 16

發布時間 : 4/10/2024

模型概述

SegVol是一個用於醫學體數據圖像分割的基礎模型，能夠支持超過200種解剖結構的識別分割。通過在9萬例未標註的CT掃描數據和6千例標註CT數據上進行訓練，該模型具有強大的分割能力。

模型特點

多模態提示

支持通過點提示、框提示和文本提示進行交互式分割。

大規模訓練數據

在9萬例未標註的CT掃描數據和6千例標註CT數據上進行訓練。

廣泛解剖結構支持

能夠識別和分割超過200種解剖結構。

3D醫學圖像處理

專門針對醫學體數據（如CT掃描）進行優化。

模型能力

醫學圖像分割

3D體積分割

交互式分割

多模態提示分割

使用案例

醫學影像分析

器官分割

自動分割CT掃描中的肝臟、腎臟、脾臟、胰腺等器官。

高精度的分割結果，可用於臨床診斷和手術規劃。

解剖結構識別

識別和分割超過200種不同的解剖結構。

提高醫學影像分析的效率和準確性。

🚀 SegVol - 用於體積醫學圖像分割的通用交互式模型

SegVol 是一個用於體積醫學圖像分割的通用交互式模型。它接受點、框和文本提示，並輸出體積分割結果。通過在 90k 個未標記的計算機斷層掃描（CT）體積和 6k 個標記的 CT 上進行訓練，這個基礎模型支持對 200 多個解剖類別進行分割。

論文和代碼已發佈。

關鍵詞：3D 醫學 SAM，體積圖像分割

image/jpeg

🚀 快速開始

🔧 環境要求

conda create -n segvol_transformers python=3.8
conda activate segvol_transformers

需要 pytorch v1.11.0（或更高版本）。使用以下命令安裝關鍵依賴：

pip install 'monai[all]==0.9.0'
pip install einops==0.6.1
pip install transformers==4.18.0
pip install matplotlib

💻 測試腳本

from transformers import AutoModel, AutoTokenizer
import torch

# get device
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# load model
# IF you cannot connect to huggingface.co, you can download the repo and set from_pretrained path as the loacl dir path, replacing "yuxindu/segvol"
clip_tokenizer = AutoTokenizer.from_pretrained("yuxindu/segvol")
model = AutoModel.from_pretrained("yuxindu/segvol", trust_remote_code=True, test_mode=True)
model.model.text_encoder.tokenizer = clip_tokenizer
model.eval()
model.to(device)
print('model load done')

# set case path
# you can download this case from huggingface yuxindu/segvol files and versions
ct_path = 'path/to/Case_image_00001_0000.nii.gz'
gt_path = 'path/to/Case_label_00001.nii.gz'

# set categories, corresponding to the unique values(1, 2, 3, 4, ...) in ground truth mask
categories = ["liver", "kidney", "spleen", "pancreas"]

# generate npy data format
ct_npy, gt_npy = model.processor.preprocess_ct_gt(ct_path, gt_path, category=categories)
# IF you have download our 25 processed datasets, you can skip to here with the processed ct_npy, gt_npy files

# go through zoom_transform to generate zoomout & zoomin views
data_item = model.processor.zoom_transform(ct_npy, gt_npy)

# add batch dim manually
data_item['image'], data_item['label'], data_item['zoom_out_image'], data_item['zoom_out_label'] = \
data_item['image'].unsqueeze(0).to(device), data_item['label'].unsqueeze(0).to(device), data_item['zoom_out_image'].unsqueeze(0).to(device), data_item['zoom_out_label'].unsqueeze(0).to(device)

# take liver as the example
cls_idx = 0

# text prompt
text_prompt = [categories[cls_idx]]

# point prompt
point_prompt, point_prompt_map = model.processor.point_prompt_b(data_item['zoom_out_label'][0][cls_idx], device=device)   # inputs w/o batch dim, outputs w batch dim

# bbox prompt
bbox_prompt, bbox_prompt_map = model.processor.bbox_prompt_b(data_item['zoom_out_label'][0][cls_idx], device=device)   # inputs w/o batch dim, outputs w batch dim

print('prompt done')

# segvol test forward
# use_zoom: use zoom-out-zoom-in
# point_prompt_group: use point prompt
# bbox_prompt_group: use bbox prompt
# text_prompt: use text prompt
logits_mask = model.forward_test(image=data_item['image'],
      zoomed_image=data_item['zoom_out_image'],
      # point_prompt_group=[point_prompt, point_prompt_map],
      bbox_prompt_group=[bbox_prompt, bbox_prompt_map],
      text_prompt=text_prompt,
      use_zoom=False
      )

# cal dice score
dice = model.processor.dice_score(logits_mask[0][0], data_item['label'][0][cls_idx])
print(dice)

# save prediction as nii.gz file
save_path='./Case_preds_00001.nii.gz'
model.processor.save_preds(ct_path, save_path, logits_mask[0][0], 
                           start_coord=data_item['foreground_start_coord'], 
                           end_coord=data_item['foreground_end_coord'])
print('done')

💻 訓練腳本

from transformers import AutoModel, AutoTokenizer
import torch

# get device
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# load model
# IF you cannot connect to huggingface.co, you can download the repo and set from_pretrained path as the loacl dir path, replacing "yuxindu/segvol"
clip_tokenizer = AutoTokenizer.from_pretrained("yuxindu/segvol")
model = AutoModel.from_pretrained("yuxindu/segvol", trust_remote_code=True, test_mode=False)
model.model.text_encoder.tokenizer = clip_tokenizer
model.train()
model.to(device)
print('model load done')

# set case path
# you can download this case from huggingface yuxindu/segvol files and versions
ct_path = 'path/to/Case_image_00001_0000.nii.gz'
gt_path = 'path/to/Case_label_00001.nii.gz'

# set categories, corresponding to the unique values(1, 2, 3, 4, ...) in ground truth mask
categories = ["liver", "kidney", "spleen", "pancreas"]

# generate npy data format
ct_npy, gt_npy = model.processor.preprocess_ct_gt(ct_path, gt_path, category=categories)
# IF you have download our 25 processed datasets, you can skip to here with the processed ct_npy, gt_npy files

# go through train transform
data_item = model.processor.train_transform(ct_npy, gt_npy)

# training example
# add batch dim manually
image, gt3D = data_item["image"].unsqueeze(0).to(device), data_item["label"].unsqueeze(0).to(device) # add batch dim

loss_step_avg = 0
for cls_idx in range(len(categories)):
    # optimizer.zero_grad()
    organs_cls = categories[cls_idx]
    labels_cls = gt3D[:, cls_idx]
    print(image.shape, organs_cls, labels_cls.shape)
    loss = model.forward_train(image, train_organs=organs_cls, train_labels=labels_cls)
    loss_step_avg += loss.item()
    loss.backward()
    # optimizer.step()

loss_step_avg /= len(categories)
print(f'AVG loss {loss_step_avg}')

# save ckpt
model.save_pretrained('./ckpt')