BiomedVLP-CXR-BERT-general open-source model - Optimize the processing of chest X-ray radiology text

Biomedvlp CXR BERT General

Developed by microsoft

CXR-BERT is a specialized language model developed for the chest X-ray domain, optimized for radiology text processing through improved vocabulary and pretraining procedures

Large Language Model

Transformers

EnglishOpen Source License:MIT #Chest X-ray Specialized #Multimodal Contrastive Learning #Clinical NLP

Downloads 12.31k

Release Time : 5/5/2022

Model Overview

A BERT-based biomedical pretraining model focused on chest X-ray report analysis, achieving text-image representation alignment through multi-stage training

Model Features

Domain-optimized Vocabulary

Tokenizer optimized for biomedical literature and clinical reports, reducing 38% redundant tokens

Multi-stage Pretraining

Three-phase training: MLM tasks → radiology domain adaptation → multimodal contrastive learning

Cross-modal Alignment

CLIP-like framework for text-image representation space alignment

Model Capabilities

Radiology Natural Language Inference

Medical Text Mask Prediction

Zero-shot Medical Image Localization

Cross-modal Retrieval

Use Cases

Medical Research

Radiology Report Analysis

Automatically parse clinical findings in chest X-ray reports

Achieved 65.21% accuracy on RadNLI task

Medical Image Retrieval

Retrieve relevant medical images based on text descriptions

CNR score of 1.142 on MS-CXR dataset

Clinical Assistance

Imaging Diagnosis Support

Generate standardized descriptive text corresponding to imaging findings

🚀 CXR-BERT-general

CXR-BERT is a chest X-ray (CXR) domain-specific language model that enhances performance in radiology natural language inference, masked language model token prediction, and downstream vision-language processing tasks.

CXR-BERT is a chest X-ray (CXR) domain-specific language model. It leverages an improved vocabulary, a novel pretraining procedure, weight regularization, and text augmentations. The resulting model shows better performance in radiology natural language inference, radiology masked language model token prediction, and downstream vision - language processing tasks such as zero - shot phrase grounding and image classification.

First, we pretrain CXR-BERT-general from a randomly initialized BERT model via Masked Language Modeling (MLM) on abstracts from PubMed and clinical notes from the publicly - available MIMIC-III and MIMIC-CXR. In this regard, the general model is expected to be applicable for research in clinical domains other than chest radiology through domain - specific fine - tuning.

CXR-BERT-specialized is continuously pretrained from CXR-BERT-general to further specialize in the chest X-ray domain. At the final stage, CXR-BERT is trained in a multi - modal contrastive learning framework, similar to the CLIP framework. The latent representation of the [CLS] token is used to align text/image embeddings.

✨ Features

Model variations

Property	Details
Model Type	There are two main variations: CXR-BERT-general and CXR-BERT-specialized.
Model identifier on HuggingFace	CXR-BERT-general: microsoft/BiomedVLP-CXR-BERT-general; CXR-BERT-specialized (after multi-modal training): microsoft/BiomedVLP-CXR-BERT-specialized
Vocabulary	PubMed & MIMIC
Note	CXR-BERT-general is pretrained for biomedical literature and clinical domains; CXR-BERT-specialized is pretrained for the chest X-ray domain.

Model Comparison Table

Model	Model identifier on HuggingFace	Vocabulary	Note
CXR-BERT-general	microsoft/BiomedVLP-CXR-BERT-general	PubMed & MIMIC	Pretrained for biomedical literature and clinical domains
CXR-BERT-specialized (after multi-modal training)	microsoft/BiomedVLP-CXR-BERT-specialized	PubMed & MIMIC	Pretrained for chest X-ray domain

📚 Documentation

Citation

The corresponding manuscript is accepted to be presented at the European Conference on Computer Vision (ECCV) 2022

@misc{https://doi.org/10.48550/arxiv.2204.09817,
  doi = {10.48550/ARXIV.2204.09817},
  url = {https://arxiv.org/abs/2204.09817},
  author = {Boecking, Benedikt and Usuyama, Naoto and Bannur, Shruthi and Castro, Daniel C. and Schwaighofer, Anton and Hyland, Stephanie and Wetscherek, Maria and Naumann, Tristan and Nori, Aditya and Alvarez-Valle, Javier and Poon, Hoifung and Oktay, Ozan},
  title = {Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing},
  publisher = {arXiv},
  year = {2022},
}

Model Use

Intended Use

This model is intended to be used solely for (I) future research on visual - language processing and (II) reproducibility of the experimental results reported in the reference paper.

Primary Intended Use

The primary intended use is to support AI researchers building on top of this work. CXR-BERT and its associated models should be helpful for exploring various clinical NLP & VLP research questions, especially in the radiology domain.

Out-of-Scope Use

⚠️ Important Note

Any deployed use case of the model --- commercial or otherwise --- is currently out of scope. Although we evaluated the models using a broad set of publicly - available research benchmarks, the models and evaluations are not intended for deployed use cases. Please refer to the associated paper for more details.

Data

This model builds upon existing publicly - available datasets:

These datasets reflect a broad variety of sources ranging from biomedical abstracts to intensive care unit notes to chest X-ray radiology notes. The radiology notes are accompanied with their associated chest x-ray DICOM images in the MIMIC - CXR dataset.

Performance

We demonstrate that this language model achieves state - of - the - art results in radiology natural language inference through its improved vocabulary and novel language pretraining objective leveraging semantics and discourse characteristics in radiology reports.

A highlight of comparison to other common models, including ClinicalBERT and PubMedBERT:

	RadNLI accuracy (MedNLI transfer)	Mask prediction accuracy	Avg. # tokens after tokenization	Vocabulary size
RadNLI baseline	53.30	-	-	-
ClinicalBERT	47.67	39.84	78.98 (+38.15%)	28,996
PubMedBERT	57.71	35.24	63.55 (+11.16%)	28,895
CXR-BERT (after Phase-III)	60.46	77.72	58.07 (+1.59%)	30,522
CXR-BERT (after Phase-III + Joint Training)	65.21	81.58	58.07 (+1.59%)	30,522

CXR-BERT also contributes to better vision - language representation learning through its improved text encoding capability. Below is the zero - shot phrase grounding performance on the MS - CXR dataset, which evaluates the quality of image - text latent representations.

Vision–Language Pretraining Method	Text Encoder	MS-CXR Phrase Grounding (Avg. CNR Score)
Baseline	ClinicalBERT	0.769
Baseline	PubMedBERT	0.773
ConVIRT	ClinicalBERT	0.818
GLoRIA	ClinicalBERT	0.930
BioViL	CXR-BERT	1.027
BioViL-L	CXR-BERT	1.142

💡 Usage Tip

Additional details about performance can be found in the corresponding paper, Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing.

Limitations

⚠️ Important Note

This model was developed using English corpora, and thus can be considered English - only.

Further information

Please refer to the corresponding paper, "Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing", ECCV'22 for additional details on the model training and evaluation.

For additional inference pipelines with CXR-BERT, please refer to the HI - ML GitHub repository. The associated source files will soon be accessible through this link.

📄 License

This model is released under the MIT license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご