CORe临床诊断预测模型 - 开源利用入院记录精准预测ICD9诊断编码

首页

Core Clinical Diagnosis Prediction

由 DATEXIS 开发

CORe模型基于BioBERT，通过临床结局预训练目标在医疗数据上进行训练，用于从入院记录预测ICD9诊断编码。

文本分类

Transformers

英语#多标签ICD9预测 #入院记录分析 #BioBERT优化

下载量 789

发布时间 : 3/2/2022

模型简介

该模型专门用于临床诊断预测任务，能够根据患者入院记录预测多标签ICD9编码，包括3位和4位编码及其文本描述。

模型特点

临床结局预训练

模型通过专门的临床结局预训练目标在临床记录、疾病描述和医学文章上进行训练，增强了医疗领域理解能力。

ICD层次结构整合

模型同时预测3位和4位ICD9编码及其文本描述，利用层次信息提升预测准确性。

多标签预测

能够同时预测9237个可能的诊断标签，覆盖广泛的临床诊断场景。

模型能力

临床文本分析

医疗诊断预测

多标签分类

使用案例

医疗诊断

入院诊断预测

根据患者入院记录自动预测可能的诊断编码

可预测9237个ICD9诊断编码

临床决策支持

为医生提供诊断建议，辅助临床决策

🚀 CORe模型 - 临床诊断预测

CORe（临床结果表示）模型是一个基于BioBERT的模型，经过专门的预训练和微调，用于临床诊断预测。它可以根据患者入院记录输出多标签ICD9代码预测，为临床诊断提供有力支持。

🚀 快速开始

你可以通过以下步骤快速使用CORe模型进行诊断预测：

加载模型

from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("bvanaken/CORe-clinical-diagnosis-prediction")
model = AutoModelForSequenceClassification.from_pretrained("bvanaken/CORe-clinical-diagnosis-prediction")

推理示例

input = "CHIEF COMPLAINT: Headaches\n\nPRESENT ILLNESS: 58yo man w/ hx of hypertension, AFib on coumadin presented to ED with the worst headache of his life."

tokenized_input = tokenizer(input, return_tensors="pt")
output = model(**tokenized_input)

import torch
predictions = torch.sigmoid(output.logits)
predicted_labels = [model.config.id2label[_id] for _id in (predictions > 0.3).nonzero()[:, 1].tolist()]

注意：为了获得最佳性能，建议为每个标签单独确定阈值（本示例中为0.3）。

✨ 主要特性

基于BioBERT：以BioBERT为基础，利用其在生物医学领域的预训练知识。
专门预训练：在临床笔记、疾病描述和医学文章上进行预训练，目标是_Clinical Outcome Pre-Training_。
多标签预测：输入患者入院记录，输出多标签ICD9代码预测。
丰富标签信息：模型对9237个标签进行预测，包含3位和4位ICD9代码及文本描述。

📦 安装指南

要使用该模型，你需要安装transformers库。可以使用以下命令进行安装：

pip install transformers

💻 使用示例

基础用法

from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("bvanaken/CORe-clinical-diagnosis-prediction")
model = AutoModelForSequenceClassification.from_pretrained("bvanaken/CORe-clinical-diagnosis-prediction")

input = "CHIEF COMPLAINT: Headaches\n\nPRESENT ILLNESS: 58yo man w/ hx of hypertension, AFib on coumadin presented to ED with the worst headache of his life."

tokenized_input = tokenizer(input, return_tensors="pt")
output = model(**tokenized_input)

import torch
predictions = torch.sigmoid(output.logits)
predicted_labels = [model.config.id2label[_id] for _id in (predictions > 0.3).nonzero()[:, 1].tolist()]

高级用法

在实际应用中，你可以根据具体需求对模型进行调整，例如为每个标签单独确定阈值，以获得更准确的预测结果。

# 假设我们有一个自定义的阈值列表
thresholds = [0.2, 0.3, 0.4, ...]  # 长度应与标签数量一致

input = "CHIEF COMPLAINT: Headaches\n\nPRESENT ILLNESS: 58yo man w/ hx of hypertension, AFib on coumadin presented to ED with the worst headache of his life."

tokenized_input = tokenizer(input, return_tensors="pt")
output = model(**tokenized_input)

import torch
predictions = torch.sigmoid(output.logits)
predicted_labels = []
for i, pred in enumerate(predictions[0]):
    if pred > thresholds[i]:
        predicted_labels.append(model.config.id2label[i])

📚 详细文档

模型描述

CORe（Clinical Outcome Representations）模型在论文 Clinical Outcome Predictions from Admission Notes using Self-Supervised Knowledge Integration 中被提出。它基于BioBERT，并在临床笔记、疾病描述和医学文章上进行了进一步的预训练，目标是_Clinical Outcome Pre-Training_。

此模型检查点针对诊断预测任务进行了微调。模型期望输入患者入院记录，并输出多标签ICD9代码预测。

模型预测

模型总共对9237个标签进行预测。这些标签包含3位和4位ICD9代码以及这些代码的文本描述。4位代码和文本描述有助于在训练期间将更多的主题和层次信息融入模型（详见论文第4.2节 ICD+: Incorporation of ICD Hierarchy）。我们建议在推理时仅使用3位代码预测，因为只有这些代码在我们的工作中进行了评估。

🔧 技术细节

CORe模型基于BioBERT，通过专门的预训练和微调，使其能够更好地处理临床文本。在预训练阶段，使用临床笔记、疾病描述和医学文章作为数据，以_Clinical Outcome Pre-Training_为目标，学习临床文本的特征。在微调阶段，针对诊断预测任务进行优化，使模型能够准确输出多标签ICD9代码预测。

📄 许可证

文档中未提及许可证相关信息。

📦 模型信息

属性	详情
模型类型	基于BioBERT的临床诊断预测模型
训练数据	临床笔记、疾病描述和医学文章

📖 引用

如果你使用了该模型，请引用以下论文：

@inproceedings{vanaken21,
  author    = {Betty van Aken and
               Jens-Michalis Papaioannou and
               Manuel Mayrdorfer and
               Klemens Budde and
               Felix A. Gers and
               Alexander Löser},
  title     = {Clinical Outcome Prediction from Admission Notes using Self-Supervised
               Knowledge Integration},
  booktitle = {Proceedings of the 16th Conference of the European Chapter of the
               Association for Computational Linguistics: Main Volume, {EACL} 2021,
               Online, April 19 - 23, 2021},
  publisher = {Association for Computational Linguistics},
  year      = {2021},
}