Bio_Discharge_Summary_BERT Open-Source Clinical Model - Training for Discharge Summaries to Assist Medical Analysis

Bio Discharge Summary BERT

Developed by emilyalsentzer

Clinical BERT model initialized from BioBERT, specifically trained on MIMIC discharge summaries

Large Language Model EnglishOpen Source License:MIT #Biomedical Text #Discharge Summary Processing #Clinical NLP

Downloads 6,367

Release Time : 3/2/2022

Model Overview

This model is initialized from BioBERT and trained on MIMIC discharge summaries, focusing on biomedical and clinical text processing

Model Features

Biomedical Domain Optimization

Initialized from BioBERT, specifically optimized for biomedical text

Clinical Text Specialization

Trained on MIMIC discharge summaries, suitable for processing clinical medical records

Segment Preprocessing

Input text undergoes professional segmentation to enhance clinical text comprehension

Model Capabilities

Clinical Text Understanding

Biomedical Entity Recognition

Medical Record Analysis

Masked Language Modeling

Use Cases

Clinical Natural Language Processing

Discharge Summary Analysis

Automatically analyze and understand hospital discharge summary content

Clinical Entity Recognition

Identify entities such as diseases, medications, and treatment plans in medical records

Biomedical Research

Medical Literature Processing

Process and analyze biomedical research literature

🚀 ClinicalBERT - Bio + Discharge Summary BERT Model

ClinicalBERT is a specialized BERT model initialized from BioBERT and trained on discharge summaries from MIMIC. It offers valuable embeddings for clinical NLP tasks.

🚀 Quick Start

Load the model via the transformers library:

from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_Discharge_Summary_BERT")
model = AutoModel.from_pretrained("emilyalsentzer/Bio_Discharge_Summary_BERT")

✨ Features

The Publicly Available Clinical BERT Embeddings paper contains four unique clinicalBERT models. This particular model is initialized from BioBERT and trained on only discharge summaries from MIMIC.

📦 Installation

No specific installation steps are provided in the original document other than the model loading code which is covered in the Quick Start section.

💻 Usage Examples

Basic Usage

from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_Discharge_Summary_BERT")
model = AutoModel.from_pretrained("emilyalsentzer/Bio_Discharge_Summary_BERT")

📚 Documentation

Pretraining Data

The Bio_Discharge_Summary_BERT model was trained on all discharge summaries from MIMIC III, a database containing electronic health records from ICU patients at the Beth Israel Hospital in Boston, MA. For more details on MIMIC, see here. All notes from the NOTEEVENTS table were included (~880M words).

Model Pretraining

Note Preprocessing

Each note in MIMIC was first split into sections using a rules-based section splitter (e.g. discharge summary notes were split into "History of Present Illness", "Family History", "Brief Hospital Course", etc. sections). Then each section was split into sentences using SciSpacy (en core sci md tokenizer).

Pretraining Procedures

The model was trained using code from Google's BERT repository on a GeForce GTX TITAN X 12 GB GPU. Model parameters were initialized with BioBERT (BioBERT-Base v1.0 + PubMed 200K + PMC 270K).

Pretraining Hyperparameters

We used a batch size of 32, a maximum sequence length of 128, and a learning rate of 5 · 10−5 for pre-training our models. The models trained on all MIMIC notes were trained for 150,000 steps. The dup factor for duplicating input data with different masks was set to 5. All other default parameters were used (specifically, masked language model probability = 0.15 and max predictions per sequence = 20).

More Information

Refer to the original paper, Publicly Available Clinical BERT Embeddings (NAACL Clinical NLP Workshop 2019) for additional details and performance on NLI and NER tasks.

Questions?

Post a Github issue on the clinicalBERT repo or email emilya@mit.edu with any questions.

📄 License

This project is licensed under the MIT license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご