CORe Clinical Diagnostic Prediction Model - Open-source and Precise Prediction of ICD-9 Diagnostic Codes Using Admission Records

Core Clinical Diagnosis Prediction

Developed by DATEXIS

The CORe model is based on BioBERT and trained on medical data through clinical outcome pre-training objectives for predicting ICD9 diagnosis codes from admission records.

Text Classification

Transformers

English#Multi-label ICD9 prediction #Admission note analysis #BioBERT optimization

Downloads 789

Release Time : 3/2/2022

Model Overview

This model is specifically designed for clinical diagnosis prediction tasks, capable of predicting multi-label ICD9 codes including 3-digit and 4-digit codes along with their text descriptions based on patient admission records.

Model Features

Clinical Outcome Pre-training

The model is trained on clinical records, disease descriptions, and medical articles with specialized clinical outcome pre-training objectives, enhancing its understanding of the medical domain.

ICD Hierarchy Integration

The model simultaneously predicts 3-digit and 4-digit ICD9 codes along with their text descriptions, leveraging hierarchical information to improve prediction accuracy.

Multi-label Prediction

Capable of predicting 9,237 possible diagnosis labels simultaneously, covering a wide range of clinical diagnosis scenarios.

Model Capabilities

Clinical text analysis

Medical diagnosis prediction

Multi-label classification

Use Cases

Medical Diagnosis

Admission Diagnosis Prediction

Automatically predicts possible diagnosis codes based on patient admission records

Can predict 9,237 ICD9 diagnosis codes

Clinical Decision Support

Provides diagnostic suggestions to doctors to assist in clinical decision-making

🚀 CORe Model - Clinical Diagnosis Prediction

The CORe model is designed for clinical diagnosis prediction. It addresses the challenge of accurately predicting clinical outcomes from patient admission notes, providing valuable insights for medical diagnosis and treatment planning.

🚀 Quick Start

You can load the model via the transformers library:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("bvanaken/CORe-clinical-diagnosis-prediction")
model = AutoModelForSequenceClassification.from_pretrained("bvanaken/CORe-clinical-diagnosis-prediction")

✨ Features

Specialized Pre - training: Based on BioBERT, it is further pre - trained on clinical notes, disease descriptions, and medical articles with a specialized Clinical Outcome Pre - Training objective.
Multi - label Prediction: The model expects patient admission notes as input and outputs multi - label ICD9 - code predictions.
Rich Label Information: It makes predictions on a total of 9237 labels, including 3 - and 4 - digit ICD9 codes and their textual descriptions.

💻 Usage Examples

Basic Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("bvanaken/CORe-clinical-diagnosis-prediction")
model = AutoModelForSequenceClassification.from_pretrained("bvanaken/CORe-clinical-diagnosis-prediction")

Advanced Usage

input = "CHIEF COMPLAINT: Headaches\n\nPRESENT ILLNESS: 58yo man w/ hx of hypertension, AFib on coumadin presented to ED with the worst headache of his life."

tokenized_input = tokenizer(input, return_tensors="pt")
output = model(**tokenized_input)

import torch
predictions = torch.sigmoid(output.logits)
predicted_labels = [model.config.id2label[_id] for _id in (predictions > 0.3).nonzero()[:, 1].tolist()]

Note: For the best performance, we recommend to determine the thresholds (0.3 in this example) individually per label.

📚 Documentation

Model description

The CORe (Clinical Outcome Representations) model is introduced in the paper Clinical Outcome Predictions from Admission Notes using Self - Supervised Knowledge Integration. It is based on BioBERT and further pre - trained on clinical notes, disease descriptions and medical articles with a specialised Clinical Outcome Pre - Training objective.

This model checkpoint is fine - tuned on the task of diagnosis prediction. The model expects patient admission notes as input and outputs multi - label ICD9 - code predictions.

Model Predictions

The model makes predictions on a total of 9237 labels. These contain 3 - and 4 - digit ICD9 codes and textual descriptions of these codes. The 4 - digit codes and textual descriptions help to incorporate further topical and hierarchical information into the model during training (see Section 4.2 ICD+: Incorporation of ICD Hierarchy in our paper). We recommend to only use the 3 - digit code predictions at inference time, because only those have been evaluated in our work.

More Information

For all the details about CORe and contact info, please visit CORe.app.datexis.com.

📄 License

The citation information for this model is as follows:

@inproceedings{vanaken21,
  author    = {Betty van Aken and
               Jens - Michalis Papaioannou and
               Manuel Mayrdorfer and
               Klemens Budde and
               Felix A. Gers and
               Alexander Löser},
  title     = {Clinical Outcome Prediction from Admission Notes using Self - Supervised
               Knowledge Integration},
  booktitle = {Proceedings of the 16th Conference of the European Chapter of the
               Association for Computational Linguistics: Main Volume, {EACL} 2021,
               Online, April 19 - 23, 2021},
  publisher = {Association for Computational Linguistics},
  year      = {2021},
}

📦 Installation

The installation is mainly about loading the model through the transformers library. You can install the transformers library if it's not already installed:

pip install transformers

🔧 Technical Details

The model is based on BioBERT and undergoes specialized pre - training on clinical - related data. The use of 4 - digit ICD9 codes and their textual descriptions during training helps incorporate more hierarchical and topical information. However, at inference time, only 3 - digit code predictions are recommended as they are the ones evaluated in the research. The model uses the transformers library for loading and inference, and the torch library for post - processing the model outputs.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご