đ BioClinical ModernBERT
BioClinical ModernBERT offers two sizes: base (150M parameters) and large (396M parameters). You can find the model training checkpoints here, and our code is available in our GitHub repository.
đ Quick Start
BioClinical ModernBERT is a domain - adapted encoder based on ModernBERT. It can be used directly with the transformers
library starting from v4.48.0. You can install the required library using the following command:
pip install -U transformers>=4.48.0
⨠Features
- Two Sizes: Available in base (150M parameters) and large (396M parameters) versions.
- Long - Context Processing: Incorporates long - context processing, which is beneficial for biomedical and clinical NLP tasks.
- Trained on Large Corpus: Trained on over 53.5 billion tokens from the largest biomedical and clinical corpus to date.
- Diverse Data Sources: Leverages 20 datasets from diverse institutions, domains, and geographic regions.
đĻ Installation
You can install the necessary transformers
library with the following command:
pip install -U transformers>=4.48.0
đģ Usage Examples
Basic Usage
Since BioClinical ModernBERT is a Masked Language Model (MLM), you can use the fill - mask
pipeline or load it via AutoModelForMaskedLM
.
Using AutoModelForMaskedLM
:
from transformers import AutoTokenizer, AutoModelForMaskedLM
model_id = "thomas-sounack/BioClinical-ModernBERT-base"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForMaskedLM.from_pretrained(model_id)
text = "Mitochondria is the powerhouse of the [MASK]."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
masked_index = inputs["input_ids"][0].tolist().index(tokenizer.mask_token_id)
predicted_token_id = outputs.logits[0, masked_index].argmax(axis=-1)
predicted_token = tokenizer.decode(predicted_token_id)
print("Predicted token:", predicted_token)
Using a pipeline:
import torch
from transformers import pipeline
from pprint import pprint
pipe = pipeline(
"fill-mask",
model="thomas-sounack/BioClinical-ModernBERT-base",
torch_dtype=torch.bfloat16,
)
input_text = "[MASK] is a disease caused by an uncontrolled division of abnormal cells in a part of the body."
results = pipe(input_text)
pprint(results)
Advanced Usage
To use BioClinical ModernBERT for downstream tasks like classification, retrieval, or QA, fine - tune it following standard BERT fine - tuning recipes.
â ī¸ Important Note
If your GPU supports it, we recommend using BioClinical ModernBERT with Flash Attention 2 to reach the highest efficiency. To do so, install Flash Attention as follows, then use the model as normal:
pip install flash-attn
đĄ Usage Tip
BioClinical ModernBERT, similarly to ModernBERT, does not use token type IDs unlike some earlier BERT models. Most downstream usage is identical to standard BERT models on the Hugging Face Hub, except you can omit the token_type_ids
parameter.
đ Documentation
Model Summary
BioClinical ModernBERT builds on ModernBERT base and large. It brings long - context processing and significant improvements in speed and performance for biomedical and clinical NLP. By using 20 diverse datasets, it addresses a key limitation of prior clinical encoders.
Training
Data
BioClinical ModernBERT is trained on 50.7B tokens of biomedical text from PubMed and PMC, and 2.8B tokens of clinical text from 20 datasets. The details are shown in the following table:
Property |
Details |
Model Type |
BioClinical ModernBERT (base and large) |
Training Data |
50.7B tokens from PubMed and PMC, 2.8B tokens from 20 clinical datasets |
Name |
Country |
Clinical Source |
Clinical Context |
Samples |
Tokens (M) |
ACI - BENCH |
US |
Clinical Notes |
Not Reported |
207 |
0.1 |
ADE Corpus |
Several |
Clinical Notes |
Not Reported |
20,896 |
0.5 |
Brain MRI Stroke |
Korea |
Radiology Reports |
Neurology |
2,603 |
0.2 |
CheXpert Plus |
US |
Radiology Reports |
Pulmonology |
223,460 |
60.6 |
CHIFIR |
Australia |
Pathology Reports |
Hematology / Oncology |
283 |
0.1 |
CORAL |
US |
Progress Notes |
Hematology / Oncology |
240 |
0.7 |
Eye Gaze CXR |
US |
Radiology Reports |
Pulmonology |
892 |
0.03 |
Gout Chief Complaints |
US |
Chief Complaint |
Internal Medicine |
8,429 |
0.2 |
ID - 68 |
UK |
Clinical Notes |
Psychology |
78 |
0.02 |
Inspect |
US |
Radiology Reports |
Pulmonology |
22,259 |
2.8 |
MedNLI |
US |
Clinical Notes |
Internal Medicine |
14,047 |
0.5 |
MedQA |
US |
National Medical Board Examination |
Not Reported |
14,366 |
2.0 |
MIMIC - III |
US |
Clinical Notes |
Internal Medicine |
2,021,411 |
1,047.7 |
MIMIC - IV Note |
US |
Clinical Notes |
Internal Medicine |
2,631,243 |
1,765.7 |
MTSamples |
Not Reported |
Clinical Notes |
Internal Medicine |
2,358 |
1.7 |
Negex |
US |
Discharge Summaries |
Not Reported |
2,056 |
0.1 |
PriMock57 |
UK |
Simulated Patient Care |
Internal Medicine |
57 |
0.01 |
Q - Pain |
US |
Clinical Vignettes |
Palliative Care |
51 |
0.01 |
REFLACX |
US |
Radiology Reports |
Pulmonology |
2,543 |
0.1 |
Simulated Resp. Interviews |
Canada |
Simulated Patient Care |
Pulmonology |
272 |
0.6 |
Methodology
BioClinical ModernBERT base is trained in two phases. It is initialized from the last stable - phase checkpoint of ModernBERT base and trained with a learning rate of 3e - 4 and a batch size of 72.
- Phase 1: Train on 160.5B tokens from PubMed, PMC, and the 20 clinical datasets. The learning rate remains constant, and the masking probability is set at 30%.
- Phase 2: Train only on the 20 clinical datasets. The masking probability is reduced to 15%. The model is trained for 3 epochs with a 1 - sqrt learning rate decay.
Evaluation
The following table shows the evaluation results of BioClinical ModernBERT compared with other models:
|
Model |
Context Length |
ChemProt |
Phenotype |
COS |
Social History |
DEID |
Base |
BioBERT |
512 |
89.5 |
26.6 |
94.9 |
55.8 |
74.3 |
|
Clinical BERT |
512 |
88.3 |
25.8 |
95.0 |
55.2 |
74.2 |
|
BioMed - RoBERTa |
512 |
89.0 |
36.8 |
94.9 |
55.2 |
81.1 |
|
Clinical - BigBird |
4096 |
87.4 |
26.5 |
94.0 |
53.3 |
71.2 |
|
Clinical - Longformer |
4096 |
74.2 |
46.4 |
95.2 |
56.8 |
82.3 |
|
Clinical ModernBERT |
8192 |
86.9 |
54.9 |
93.7 |
53.8 |
44.4 |
|
ModernBERT - base |
8192 |
89.5 |
48.4 |
94.0 |
53.1 |
78.3 |
|
BioClinical ModernBERT - base |
8192 |
89.9 |
58.1 |
95.1 |
58.5 |
82.7 |
Large |
ModernBERT - large |
8192 |
90.2 |
58.3 |
94.4 |
54.8 |
82.1 |
|
BioClinical ModernBERT - large |
8192 |
90.8 |
60.8 |
95.1 |
57.1 |
83.8 |
đ License
We release the BioClinical ModernBERT base and large model weights and training checkpoints under the MIT license.
đ Documentation
If you use BioClinical ModernBERT in your work, please cite our preprint:
@misc{sounack2025bioclinicalmodernbertstateoftheartlongcontext,
title={BioClinical ModernBERT: A State-of-the-Art Long-Context Encoder for Biomedical and Clinical NLP},
author={Thomas Sounack and Joshua Davis and Brigitte Durieux and Antoine Chaffin and Tom J. Pollard and Eric Lehman and Alistair E. W. Johnson and Matthew McDermott and Tristan Naumann and Charlotta Lindvall},
year={2025},
eprint={2506.10896},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2506.10896},
}