開源MAIRA-2模型，免費從胸部X光片生成專業放射學報告！

首頁

Maira 2

由microsoft開發

MAIRA-2 是一種多模態變換器，設計用於從胸部X光片生成有依據或無依據的放射學報告。

圖像生成文本

Transformers

開源協議:其他 #胸部X光報告生成 #多模態醫學影像 #有依據放射學

下載量 46.44k

發布時間 : 7/29/2024

模型概述

MAIRA-2 由圖像編碼器 RAD-DINO-MAIRA-2、投影層和語言模型 vicuna-7b-v1.5 組成，專注於胸部X光片的報告生成任務。

模型特點

多模態輸入支持

支持多種輸入組合，包括當前/歷史X光片、指徵信息和技術描述

有依據報告生成

可生成附帶圖像定位框的報告，明確標註發現位置

短語定位能力

可根據文本短語在圖像中定位特定發現

模型能力

醫學圖像分析

放射學報告生成

圖像區域定位

多模態理解

使用案例

醫學研究

放射學報告自動生成

根據胸部X光片自動生成放射學報告

可生成包含臨床發現的敘述性報告

有依據的放射學發現

生成附帶圖像定位框的放射學報告

可標註發現的具體圖像位置

醫學圖像短語定位

根據文本描述定位圖像中的特定發現

可返回描述區域的邊界框座標

🚀 MAIRA - 2

MAIRA - 2是一款多模態變換器模型，專為從胸部X光片生成有根據或無根據的放射學報告而設計。它能助力醫學研究人員進行相關研究，推動放射學報告生成領域的發展。

🚀 快速開始

我們將展示如何使用MAIRA - 2進行三種功能的推理：有根據或無根據的檢查結果生成，以及短語定位。

環境設置

要運行以下示例代碼，你需要安裝以下包：

pillow
protobuf
sentencepiece
torch
transformers

注意：由於MAIRA - 2需要transformers>=4.46.0.dev0，你可能需要暫時從源代碼安裝transformers。由於transformers主分支中的不兼容提交，當前的解決方法是安裝提交88d960937c81a32bfb63356a2e8ecf7999619681之後但在提交0f49deacbff3e57cde45222842c0db6375e4fa43之前的transformers版本。

pip install git+https://github.com/huggingface/transformers.git@88d960937c81a32bfb63356a2e8ecf7999619681

首先，初始化模型並將其設置為評估模式。

from transformers import AutoModelForCausalLM, AutoProcessor
from pathlib import Path
import torch

model = AutoModelForCausalLM.from_pretrained("microsoft/maira-2", trust_remote_code=True)
processor = AutoProcessor.from_pretrained("microsoft/maira-2", trust_remote_code=True)

device = torch.device("cuda")
model = model.eval()
model = model.to(device)

我們需要獲取一些數據來演示前向傳播。在這個示例中，我們將從IU X射線數據集收集一個示例，該數據集具有寬鬆的許可證。

import requests
from PIL import Image

def get_sample_data() -> dict[str, Image.Image | str]:
    """
    Download chest X-rays from IU-Xray, which we didn't train MAIRA-2 on. License is CC.
    We modified this function from the Rad-DINO repository on Huggingface.
    """
    frontal_image_url = "https://openi.nlm.nih.gov/imgs/512/145/145/CXR145_IM-0290-1001.png"
    lateral_image_url = "https://openi.nlm.nih.gov/imgs/512/145/145/CXR145_IM-0290-2001.png"

    def download_and_open(url: str) -> Image.Image:
        response = requests.get(url, headers={"User-Agent": "MAIRA-2"}, stream=True)
        return Image.open(response.raw)

    frontal_image = download_and_open(frontal_image_url)
    lateral_image = download_and_open(lateral_image_url)

    sample_data = {
        "frontal": frontal_image,
        "lateral": lateral_image,
        "indication": "Dyspnea.",
        "comparison": "None.",
        "technique": "PA and lateral views of the chest.",
        "phrase": "Pleural effusion."  # For the phrase grounding example. This patient has pleural effusion.
    }
    return sample_data

sample_data = get_sample_data()

使用案例1和2：有根據或無根據的檢查結果生成

我們可以根據輸入的預處理方式來切換MAIRA - 2是否生成有根據的報告，因為它使用不同的提示。讓我們從無根據的報告開始（get_grounding=False）。在生成時，對於無根據的報告使用max_new_tokens = 300，對於有根據的報告使用max_new_tokens = 450以容納額外的框和對象標記。

processed_inputs = processor.format_and_preprocess_reporting_input(
    current_frontal=sample_data["frontal"],
    current_lateral=sample_data["lateral"],
    prior_frontal=None,  # Our example has no prior
    indication=sample_data["indication"],
    technique=sample_data["technique"],
    comparison=sample_data["comparison"],
    prior_report=None,  # Our example has no prior
    return_tensors="pt",
    get_grounding=False,  # For this example we generate a non-grounded report
)

processed_inputs = processed_inputs.to(device)
with torch.no_grad():
    output_decoding = model.generate(
        **processed_inputs,
        max_new_tokens=300,  # Set to 450 for grounded reporting
        use_cache=True,
    )
prompt_length = processed_inputs["input_ids"].shape[-1]
decoded_text = processor.decode(output_decoding[0][prompt_length:], skip_special_tokens=True)
decoded_text = decoded_text.lstrip()  # Findings generation completions have a single leading space
prediction = processor.convert_output_to_plaintext_or_grounded_sequence(decoded_text)
print("Parsed prediction:", prediction)

我們會得到類似這樣的結果：

右側有大量胸腔積液，並伴有右肺基底段肺不張。左肺清晰。未發現氣胸。心臟縱隔輪廓和肺門輪廓正常。膈下無遊離氣體。腹部右上象限可見手術夾。

如果我們將get_grounding設置為True，MAIRA - 2將生成有根據的報告。在這個示例中，結果如下：

('There is a large right pleural effusion.', [(0.055, 0.275, 0.445, 0.665)]),
('The left lung is clear.', None),
('No pneumothorax is identified.', None),
('The cardiomediastinal silhouette is within normal limits.', None),
('The visualized osseous structures are unremarkable.', None)

生成的邊界框座標是框的左上角和右下角的(x, y)座標，例如(x_topleft, y_topleft, x_bottomright, y_bottomright)。這些座標是相對於_裁剪後的_圖像（即MAIRA - 2最終作為輸入的圖像）的，因此在可視化時要小心。處理器提供了一個方法adjust_box_for_original_image_size來獲取相對於原始圖像形狀的框。

請注意，由於其有根據的報告訓練數據來自不同的數據分佈，MAIRA - 2在有根據和無根據的報告場景中生成的報告略有不同。

使用案例3：短語定位

這裡的輸入不同，因為我們為模型提供了一個要在圖像中定位的短語。回想一下（get_sample_data），我們這裡的短語是“Pleural effusion”，我們已經知道這個圖像中存在這個情況。

processed_inputs = processor.format_and_preprocess_phrase_grounding_input(
    frontal_image=sample_data["frontal"],
    phrase=sample_data["phrase"],
    return_tensors="pt",
)

processed_inputs = processed_inputs.to(device)
with torch.no_grad():
    output_decoding = model.generate(
        **processed_inputs,
        max_new_tokens=150,
        use_cache=True,
    )
prompt_length = processed_inputs["input_ids"].shape[-1]
decoded_text = processor.decode(output_decoding[0][prompt_length:], skip_special_tokens=True)
prediction = processor.convert_output_to_plaintext_or_grounded_sequence(decoded_text)

print("Parsed prediction:", prediction)

這會給我們類似這樣的結果：

('Pleural effusion.', [(0.025, 0.345, 0.425, 0.575)])

同樣，對於有根據的報告，我們必須記住邊界框座標是相對於MAIRA - 2看到的裁剪後的圖像的，使用processor.adjust_box_for_original_image_size來獲取調整為原始圖像形狀的框。

✨ 主要特性

多模態設計：結合圖像編碼器、投影層和語言模型，實現從胸部X光片生成放射學報告。
多種輸出形式：可以生成無根據的敘述性文本報告，也能生成有根據的報告，對描述的發現提供邊界框定位。
短語定位功能：能夠根據輸入的短語在圖像中定位相關發現。

📚 詳細文檔

模型詳情

模型描述

MAIRA - 2由圖像編碼器RAD - DINO - MAIRA - 2（凍結使用）、投影層（從頭開始訓練）和語言模型vicuna - 7b - v1.5（完全微調）組成。

開發者：Microsoft Research Health Futures
模型類型：多模態變換器
自然語言處理語言：英語
許可證：MSRLA
微調基礎模型（可選）：vicuna - 7b - 1.5，RAD - DINO - MAIRA - 2

使用場景

MAIRA - 2僅用於研究目的，不應用於臨床實踐。MAIRA - 2的能力和特性，包括其在應用場景中的準確性和可靠性、在不同人群和用途中的公平性以及安全性和隱私性，尚未經過廣泛測試。

直接使用

MAIRA - 2的輸入包括一張胸部正位X光片，以及以下任意一項：

當前檢查的側位片
先前檢查的正位片及相關的先前報告
當前檢查的指徵
當前檢查的技術和對比部分

MAIRA - 2可以以以下兩種形式生成當前檢查的_檢查結果_部分：

無任何圖像註釋的敘述性文本（這是典型的報告生成場景）。
有根據的報告，其中所有描述的發現都伴有零個或多個邊界框，指示它們在當前正位圖像上的位置。

MAIRA - 2還可以執行短語定位。在這種情況下，還必須為其提供一個輸入短語。然後它將重複該短語並生成一個邊界框，定位短語中描述的發現。

這些使用案例在下面的示例代碼中進行了說明。

超出適用範圍的使用

MAIRA - 2僅在來自成年人的英語胸部X光片報告數據集上進行了訓練，預計在其他成像模式或解剖部位上無法正常工作。輸入提示的變化（例如更改指令）可能會降低性能，因為該模型未針對任意用戶輸入進行優化。

如前所述，這是一個研究模型，不應在任何實際臨床或生產場景中使用。

偏差、風險和侷限性

數據偏差

MAIRA - 2在來自西班牙（從原始西班牙語翻譯成英語）和美國的胸部X光片報告數據集上進行了訓練，如下所列。不同衛生系統和地區的報告風格、患者人口統計學和疾病流行率以及圖像採集協議可能會有所不同。這些因素將影響模型的泛化能力。

模型誤差（虛構、遺漏）

如MAIRA - 2報告中更詳細的概述，該模型在其任務上並非完美執行。因此，生成的（有根據的）報告中可能存在誤差。

訓練詳情

我們最初並非使用此處提供的確切模型類來訓練MAIRA - 2，但我們已檢查其行為是相同的。我們提供這個類是為了便於研究複用和推理。

訓練數據

MAIRA - 2在公共和私有胸部X光片數據集的混合數據上進行訓練。每個示例包含一張或多張胸部X光圖像以及相關的報告文本，有或沒有定位（空間註釋）。模型被訓練為生成報告的_檢查結果_部分，有或沒有定位。

數據集	國家	無根據示例數量	有根據示例數量
MIMIC - CXR	美國	55218	595*
PadChest	西班牙	52828	3122
USMix（私有）	美國	118031	53613

*我們使用MS - CXR短語定位數據集為MIMIC - CXR提供定位示例。

環境影響

可以使用Lacoste等人（2019）中介紹的機器學習影響計算器來估算碳排放。

硬件類型：NVIDIA A100 GPUs
使用時長：1432小時
雲服務提供商：Azure
計算區域：美國西部2
碳排放：107.4 CO₂ eq (表面上由該提供商抵消)

引用

如果你使用了MAIRA - 2模型，可以按照以下格式進行引用：

BibTeX：

@article{Bannur2024MAIRA2GR,
  title={MAIRA-2: Grounded Radiology Report Generation},
  author={Shruthi Bannur and Kenza Bouzid and Daniel C. Castro and Anton Schwaighofer and Anja Thieme and Sam Bond-Taylor and Maximilian Ilse and Fernando P\'{e}rez-Garc\'{i}a and Valentina Salvatelli and Harshita Sharma and Felix Meissen and Mercy Prasanna Ranjit and Shaury Srivastav and Julia Gong and Noel C. F. Codella and Fabian Falck and Ozan Oktay and Matthew P. Lungren and Maria T. A. Wetscherek and Javier Alvarez-Valle and Stephanie L. Hyland},
  journal={arXiv},
  year={2024},
  volume={abs/2406.04449},
  url={https://arxiv.org/abs/2406.04449}
}

APA：

Bannur*, S., Bouzid*, K., Castro, D. C., Schwaighofer, A., Thieme, A., Bond-Taylor, S., Ilse, M., Pérez-García, F., Salvatelli, V., Sharma, H., Meissen, F., Ranjit, M.P., Srivastav, S., Gong, J., Codella, N.C.F., Falck, F., Oktay, O., Lungren, M.P., Wetscherek, M.T., Alvarez-Valle, J., & Hyland, S. L. (2024). MAIRA-2: Grounded Radiology Report Generation. arXiv preprint abs/2406.04449.