M3D-LaMed-Llama-2-7B开源模型 - 助力3D医学影像分析实用之选

首页

M3D LaMed Llama 2 7B

由 GoodBaiBai88 开发

M3D是基于多模态大语言模型的3D医学影像分析技术，包含M3D-Data数据集、M3D-LaMed模型和M3D-Bench评估基准。

图像生成文本

Transformers

开源协议:Apache-2.0 #3D医学影像分析 #多模态大语言模型 #医学报告生成

下载量 209

发布时间 : 4/27/2024

模型简介

M3D-LaMed是搭载M3D-CLIP预训练视觉编码器的多功能多模态模型，支持图文检索、报告生成、视觉问答、定位与分割等任务。

模型特点

多模态3D医学影像分析

支持处理3D医学影像数据，实现多模态医学影像分析

多功能任务支持

可执行图文检索、报告生成、视觉问答、定位与分割等多种任务

大规模预训练数据

基于M3D-Data数据集训练，包含12万图文对与66.2万指令-应答对

模型能力

3D医学影像分析

医学报告生成

视觉问答

器官分割

边界框标注

图文检索

使用案例

医学影像诊断

肝脏区域分割

识别并分割3D医学影像中的肝脏区域

输出分割掩码

医学报告生成

根据3D医学影像自动生成检查发现描述文本

生成自然语言报告

医学影像分析

器官定位

标注图像中特定器官的边界框

输出边界框坐标

医学影像问答

回答关于3D医学影像内容的专业问题

提供准确的医学解释

🚀 M3D：借助多模态大语言模型推动3D医学图像分析发展

M3D是首个全面致力于3D医学分析的多模态大语言模型系列工作，旨在解决3D医学图像分析中的复杂问题，为医学研究和临床应用提供强大的支持。

论文 | 数据 | 代码

✨ 主要特性

M3D系列工作涵盖了数据集、模型和评估基准三个关键部分，具体如下：

M3D-Data：这是目前最大规模的开源3D医学数据集，包含120K图像-文本对和662K指令-响应对，为模型训练提供了丰富的数据资源。
M3D-LaMed：基于M3D-CLIP预训练视觉编码器的多模态模型，具备图像-文本检索、报告生成、视觉问答、定位和分割等多种任务能力。
M3D-Bench：最全面的自动评估基准，涵盖8个任务，可有效评估模型在不同任务上的性能。

⚠️ 重要提示

我们发现之前的GoodBaiBai88/M3D-LaMed-Llama-2-7B模型在分割任务中存在问题。目前已修复该问题，并将在未来几天内重新发布新模型。

🚀 快速开始

我们可以基于Hugging Face轻松使用我们的模型。

基础用法

import numpy as np
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
import simple_slice_viewer as ssv
import SimpleITK as sikt

device = torch.device('cuda') # 'cpu', 'cuda'
dtype = torch.bfloat16 # or bfloat16, float16, float32

model_name_or_path = 'GoodBaiBai88/M3D-LaMed-Llama-2-7B'
proj_out_num = 256

# Prepare your 3D medical image:
# 1. The image shape needs to be processed as 1*32*256*256, consider resize and other methods.
# 2. The image needs to be normalized to 0-1, consider Min-Max Normalization.
# 3. The image format needs to be converted to .npy 
# 4. Although we did not train on 2D images, in theory, the 2D image can be interpolated to the shape of 1*32*256*256 for input.
image_path = "./Data/data/examples/example_01.npy"

model = AutoModelForCausalLM.from_pretrained(
    model_name_or_path,
    torch_dtype=dtype,
    device_map='auto',
    trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(
    model_name_or_path,
    model_max_length=512,
    padding_side="right",
    use_fast=False,
    trust_remote_code=True
)

model = model.to(device=device)

# question = "Can you provide a caption consists of findings for this medical image?"
question = "What is liver in this image? Please output the segmentation mask."
# question = "What is liver in this image? Please output the box."

image_tokens = "<im_patch>" * proj_out_num
input_txt = image_tokens + question
input_id = tokenizer(input_txt, return_tensors="pt")['input_ids'].to(device=device)

image_np = np.load(image_path)
image_pt = torch.from_numpy(image_np).unsqueeze(0).to(dtype=dtype, device=device)

# generation = model.generate(image_pt, input_id, max_new_tokens=256, do_sample=True, top_p=0.9, temperature=1.0)
generation, seg_logit = model.generate(image_pt, input_id, seg_enable=True, max_new_tokens=256, do_sample=True, top_p=0.9, temperature=1.0)

generated_texts = tokenizer.batch_decode(generation, skip_special_tokens=True)
seg_mask = (torch.sigmoid(seg_logit) > 0.5) * 1.0

print('question', question)
print('generated_texts', generated_texts[0])

image = sikt.GetImageFromArray(image_np)
ssv.display(image)
seg = sikt.GetImageFromArray(seg_mask.cpu().numpy()[0])
ssv.display(seg)