M3D-LaMed-Llama-2-7B開源模型 - 助力3D醫學影像分析實用之選

首頁

M3D LaMed Llama 2 7B

由GoodBaiBai88開發

M3D是基於多模態大語言模型的3D醫學影像分析技術，包含M3D-Data數據集、M3D-LaMed模型和M3D-Bench評估基準。

圖像生成文本

Transformers

開源協議:Apache-2.0 #3D醫學影像分析 #多模態大語言模型 #醫學報告生成

下載量 209

發布時間 : 4/27/2024

模型概述

M3D-LaMed是搭載M3D-CLIP預訓練視覺編碼器的多功能多模態模型，支持圖文檢索、報告生成、視覺問答、定位與分割等任務。

模型特點

多模態3D醫學影像分析

支持處理3D醫學影像數據，實現多模態醫學影像分析

多功能任務支持

可執行圖文檢索、報告生成、視覺問答、定位與分割等多種任務

大規模預訓練數據

基於M3D-Data數據集訓練，包含12萬圖文對與66.2萬指令-應答對

模型能力

3D醫學影像分析

醫學報告生成

視覺問答

器官分割

邊界框標註

圖文檢索

使用案例

醫學影像診斷

肝臟區域分割

識別並分割3D醫學影像中的肝臟區域

輸出分割掩碼

醫學報告生成

根據3D醫學影像自動生成檢查發現描述文本

生成自然語言報告

醫學影像分析

器官定位

標註圖像中特定器官的邊界框

輸出邊界框座標

醫學影像問答

回答關於3D醫學影像內容的專業問題

提供準確的醫學解釋

🚀 M3D：藉助多模態大語言模型推動3D醫學圖像分析發展

M3D是首個全面致力於3D醫學分析的多模態大語言模型系列工作，旨在解決3D醫學圖像分析中的複雜問題，為醫學研究和臨床應用提供強大的支持。

論文 | 數據 | 代碼

✨ 主要特性

M3D系列工作涵蓋了數據集、模型和評估基準三個關鍵部分，具體如下：

M3D-Data：這是目前最大規模的開源3D醫學數據集，包含120K圖像-文本對和662K指令-響應對，為模型訓練提供了豐富的數據資源。
M3D-LaMed：基於M3D-CLIP預訓練視覺編碼器的多模態模型，具備圖像-文本檢索、報告生成、視覺問答、定位和分割等多種任務能力。
M3D-Bench：最全面的自動評估基準，涵蓋8個任務，可有效評估模型在不同任務上的性能。

⚠️ 重要提示

我們發現之前的GoodBaiBai88/M3D-LaMed-Llama-2-7B模型在分割任務中存在問題。目前已修復該問題，並將在未來幾天內重新發布新模型。

🚀 快速開始

我們可以基於Hugging Face輕鬆使用我們的模型。

基礎用法

import numpy as np
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
import simple_slice_viewer as ssv
import SimpleITK as sikt

device = torch.device('cuda') # 'cpu', 'cuda'
dtype = torch.bfloat16 # or bfloat16, float16, float32

model_name_or_path = 'GoodBaiBai88/M3D-LaMed-Llama-2-7B'
proj_out_num = 256

# Prepare your 3D medical image:
# 1. The image shape needs to be processed as 1*32*256*256, consider resize and other methods.
# 2. The image needs to be normalized to 0-1, consider Min-Max Normalization.
# 3. The image format needs to be converted to .npy 
# 4. Although we did not train on 2D images, in theory, the 2D image can be interpolated to the shape of 1*32*256*256 for input.
image_path = "./Data/data/examples/example_01.npy"

model = AutoModelForCausalLM.from_pretrained(
    model_name_or_path,
    torch_dtype=dtype,
    device_map='auto',
    trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(
    model_name_or_path,
    model_max_length=512,
    padding_side="right",
    use_fast=False,
    trust_remote_code=True
)

model = model.to(device=device)

# question = "Can you provide a caption consists of findings for this medical image?"
question = "What is liver in this image? Please output the segmentation mask."
# question = "What is liver in this image? Please output the box."

image_tokens = "<im_patch>" * proj_out_num
input_txt = image_tokens + question
input_id = tokenizer(input_txt, return_tensors="pt")['input_ids'].to(device=device)

image_np = np.load(image_path)
image_pt = torch.from_numpy(image_np).unsqueeze(0).to(dtype=dtype, device=device)

# generation = model.generate(image_pt, input_id, max_new_tokens=256, do_sample=True, top_p=0.9, temperature=1.0)
generation, seg_logit = model.generate(image_pt, input_id, seg_enable=True, max_new_tokens=256, do_sample=True, top_p=0.9, temperature=1.0)

generated_texts = tokenizer.batch_decode(generation, skip_special_tokens=True)
seg_mask = (torch.sigmoid(seg_logit) > 0.5) * 1.0

print('question', question)
print('generated_texts', generated_texts[0])

image = sikt.GetImageFromArray(image_np)
ssv.display(image)
seg = sikt.GetImageFromArray(seg_mask.cpu().numpy()[0])
ssv.display(seg)