CORe臨床診斷預測模型 - 開源利用入院記錄精準預測ICD9診斷編碼

首頁

Core Clinical Diagnosis Prediction

由DATEXIS開發

CORe模型基於BioBERT，通過臨床結局預訓練目標在醫療數據上進行訓練，用於從入院記錄預測ICD9診斷編碼。

文本分類

Transformers

英語#多標籤ICD9預測 #入院記錄分析 #BioBERT優化

下載量 789

發布時間 : 3/2/2022

模型概述

該模型專門用於臨床診斷預測任務，能夠根據患者入院記錄預測多標籤ICD9編碼，包括3位和4位編碼及其文本描述。

模型特點

臨床結局預訓練

模型通過專門的臨床結局預訓練目標在臨床記錄、疾病描述和醫學文章上進行訓練，增強了醫療領域理解能力。

ICD層次結構整合

模型同時預測3位和4位ICD9編碼及其文本描述，利用層次信息提升預測準確性。

多標籤預測

能夠同時預測9237個可能的診斷標籤，覆蓋廣泛的臨床診斷場景。

模型能力

臨床文本分析

醫療診斷預測

多標籤分類

使用案例

醫療診斷

入院診斷預測

根據患者入院記錄自動預測可能的診斷編碼

可預測9237個ICD9診斷編碼

臨床決策支持

為醫生提供診斷建議，輔助臨床決策

🚀 CORe模型 - 臨床診斷預測

CORe（臨床結果表示）模型是一個基於BioBERT的模型，經過專門的預訓練和微調，用於臨床診斷預測。它可以根據患者入院記錄輸出多標籤ICD9代碼預測，為臨床診斷提供有力支持。

🚀 快速開始

你可以通過以下步驟快速使用CORe模型進行診斷預測：

加載模型

from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("bvanaken/CORe-clinical-diagnosis-prediction")
model = AutoModelForSequenceClassification.from_pretrained("bvanaken/CORe-clinical-diagnosis-prediction")

推理示例

input = "CHIEF COMPLAINT: Headaches\n\nPRESENT ILLNESS: 58yo man w/ hx of hypertension, AFib on coumadin presented to ED with the worst headache of his life."

tokenized_input = tokenizer(input, return_tensors="pt")
output = model(**tokenized_input)

import torch
predictions = torch.sigmoid(output.logits)
predicted_labels = [model.config.id2label[_id] for _id in (predictions > 0.3).nonzero()[:, 1].tolist()]

注意：為了獲得最佳性能，建議為每個標籤單獨確定閾值（本示例中為0.3）。

✨ 主要特性

基於BioBERT：以BioBERT為基礎，利用其在生物醫學領域的預訓練知識。
專門預訓練：在臨床筆記、疾病描述和醫學文章上進行預訓練，目標是_Clinical Outcome Pre-Training_。
多標籤預測：輸入患者入院記錄，輸出多標籤ICD9代碼預測。
豐富標籤信息：模型對9237個標籤進行預測，包含3位和4位ICD9代碼及文本描述。

📦 安裝指南

要使用該模型，你需要安裝transformers庫。可以使用以下命令進行安裝：

pip install transformers

💻 使用示例

基礎用法

from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("bvanaken/CORe-clinical-diagnosis-prediction")
model = AutoModelForSequenceClassification.from_pretrained("bvanaken/CORe-clinical-diagnosis-prediction")

input = "CHIEF COMPLAINT: Headaches\n\nPRESENT ILLNESS: 58yo man w/ hx of hypertension, AFib on coumadin presented to ED with the worst headache of his life."

tokenized_input = tokenizer(input, return_tensors="pt")
output = model(**tokenized_input)

import torch
predictions = torch.sigmoid(output.logits)
predicted_labels = [model.config.id2label[_id] for _id in (predictions > 0.3).nonzero()[:, 1].tolist()]

高級用法

在實際應用中，你可以根據具體需求對模型進行調整，例如為每個標籤單獨確定閾值，以獲得更準確的預測結果。

# 假設我們有一個自定義的閾值列表
thresholds = [0.2, 0.3, 0.4, ...]  # 長度應與標籤數量一致

input = "CHIEF COMPLAINT: Headaches\n\nPRESENT ILLNESS: 58yo man w/ hx of hypertension, AFib on coumadin presented to ED with the worst headache of his life."

tokenized_input = tokenizer(input, return_tensors="pt")
output = model(**tokenized_input)

import torch
predictions = torch.sigmoid(output.logits)
predicted_labels = []
for i, pred in enumerate(predictions[0]):
    if pred > thresholds[i]:
        predicted_labels.append(model.config.id2label[i])

📚 詳細文檔

模型描述

CORe（Clinical Outcome Representations）模型在論文 Clinical Outcome Predictions from Admission Notes using Self-Supervised Knowledge Integration 中被提出。它基於BioBERT，並在臨床筆記、疾病描述和醫學文章上進行了進一步的預訓練，目標是_Clinical Outcome Pre-Training_。

此模型檢查點針對診斷預測任務進行了微調。模型期望輸入患者入院記錄，並輸出多標籤ICD9代碼預測。

模型預測

模型總共對9237個標籤進行預測。這些標籤包含3位和4位ICD9代碼以及這些代碼的文本描述。4位代碼和文本描述有助於在訓練期間將更多的主題和層次信息融入模型（詳見論文第4.2節 ICD+: Incorporation of ICD Hierarchy）。我們建議在推理時僅使用3位代碼預測，因為只有這些代碼在我們的工作中進行了評估。

🔧 技術細節

CORe模型基於BioBERT，通過專門的預訓練和微調，使其能夠更好地處理臨床文本。在預訓練階段，使用臨床筆記、疾病描述和醫學文章作為數據，以_Clinical Outcome Pre-Training_為目標，學習臨床文本的特徵。在微調階段，針對診斷預測任務進行優化，使模型能夠準確輸出多標籤ICD9代碼預測。

📄 許可證

文檔中未提及許可證相關信息。

📦 模型信息

屬性	詳情
模型類型	基於BioBERT的臨床診斷預測模型
訓練數據	臨床筆記、疾病描述和醫學文章

📖 引用

如果你使用了該模型，請引用以下論文：

@inproceedings{vanaken21,
  author    = {Betty van Aken and
               Jens-Michalis Papaioannou and
               Manuel Mayrdorfer and
               Klemens Budde and
               Felix A. Gers and
               Alexander Löser},
  title     = {Clinical Outcome Prediction from Admission Notes using Self-Supervised
               Knowledge Integration},
  booktitle = {Proceedings of the 16th Conference of the European Chapter of the
               Association for Computational Linguistics: Main Volume, {EACL} 2021,
               Online, April 19 - 23, 2021},
  publisher = {Association for Computational Linguistics},
  year      = {2021},
}