B

Biomedvlp BioViL T

Developed by microsoft
BioViL-T is a vision-language model focused on analyzing chest X-rays and radiology reports, enhancing performance through temporal multimodal pretraining.
Downloads 26.39k
Release Time : 2/17/2023

Model Overview

BioViL-T is a domain-specific vision-language model dedicated to analyzing chest X-rays (CXRs) and radiology reports. The model employs temporal multimodal pretraining methods, embedding temporal information in both image and text modalities as well as joint spaces, significantly improving performance across multiple downstream tasks.

Model Features

Temporal Multimodal Pretraining
Fully utilizes the temporal structure between data points to enhance downstream task performance while maintaining the same training dataset.
Cross-modal Alignment
Aligns text and image embeddings using latent representations of [CLS] tokens for better cross-modal understanding.
Domain-specific Optimization
Specifically optimized for the chest X-ray and radiology report domain, excelling in related tasks.
Two-stage Training
The language model undergoes general biomedical domain pretraining first, followed by radiology-specific training to enhance professionalism.

Model Capabilities

Chest X-ray analysis
Radiology report understanding
Natural language inference
Phrase grounding
Image classification
Text classification
Language decoding
Cross-modal retrieval

Use Cases

Medical Imaging Analysis
Chest X-ray Abnormality Detection
Analyzes chest X-rays to detect abnormalities such as pleural effusion or pneumothorax.
Achieves 87.77% accuracy on the MS-CXR-T benchmark
Radiology Report Generation
Generates or supplements radiology reports based on chest X-rays.
Medical Research
Medical Imaging Language Processing Research
Supports AI researchers in exploring clinical NLP and VLP research questions.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase