Biobart_radiology_summarization Open-source Model - Freely Summarize Radiology Findings into Impressions

Biobart Radiology Summarization

Developed by hamzamalik11

A sequence-to-sequence model based on BioBart for summarizing radiological findings into impressions, trained on 70,000 radiology reports.

Text Generation

Transformers

English#Radiology Report Summarization #Medical Text Generation #BioBart Fine-tuning

Downloads 28

Release Time : 8/1/2023

Model Overview

This model is used to generate accurate and informative impressions from radiology reports, improving communication between radiologists and other healthcare providers.

Model Features

Medical Domain Specialization

Fine-tuned on the biomedical pre-trained model BioBart, specifically optimized for radiology reports

Large-scale Training Data

Trained on 70,000 radiology reports to ensure coverage of various radiological findings

Clinical Communication Optimization

Generated impressions are formatted to meet clinical needs, facilitating quick access to key information by healthcare professionals

Model Capabilities

Radiology Report Summarization

Medical Text Generation

Clinical Information Extraction

Use Cases

Radiology

CT Report Summarization

Summarize detailed CT findings into concise clinical impressions

Improve communication efficiency between radiologists and clinicians

MRI Report Summarization

Extract key findings from complex MRI results and generate summaries

Help clinicians quickly grasp patient conditions

Clinical Decision Support

Emergency Report Quick Interpretation

Quickly generate summaries of key findings from radiological examinations in emergency situations

Reduce emergency decision-making time

🚀 Model Card for Model ID

This model can generate accurate and informative radiology impressions, which helps improve communication between radiologists and other healthcare providers.

🚀 Quick Start

Use the code below to get started with the model:

from transformers import AutoTokenizer, AutoModelForMaskedLM, AutoModelForSeq2SeqLM
from transformers import DataCollatorForSeq2Seq

model_checkpoint = "attach your trained model here"

model = AutoModelForSeq2SeqLM.from_pretrained(model_checkpoint)
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
from transformers import SummarizationPipeline

summarizer = SummarizationPipeline(model=model, tokenizer=tokenizer)

output= summarizer("heart size normal mediastinal hilar contours remain stable small right pneumothorax remains unchanged surgical lung staples overlying left upper lobe seen linear pattern consistent prior upper lobe resection soft tissue osseous structures appear unremarkable nasogastric endotracheal tubes remain satisfactory position atelectatic changes right lower lung field remain unchanged prior study")

✨ Features

The model can generate accurate and informative radiology impressions, facilitating better communication between radiologists and other healthcare providers.

📚 Documentation

Model Details

Model Description

This is a BioBart-based sequence-to-sequence model trained on a custom dataset to summarize radiology findings into impressions. During training, 70,000 radiology reports were used to train the model for this task.

Developed by: [Engr. Hamza Iqbal Malik (UET TAXILA)]
Shared by : [Engr. Hamza Iqbal Malik (UET TAXILA)]
Model type: [Medical Text Summarization Model]
Language(s) (NLP): [English]
Finetuned from model: [GanjinZero/biobart-v2-base]

Model Sources

Repository: [GanjinZero/biobart-v2-base]
Paper: [BioBART: Pretraining and Evaluation of A Biomedical Generative Language Model]
Demo: [hamzamalik11/radiology_summarizer]

Uses

Direct Use

The model can be directly used to generate impressions from radiology reports. Users can input the findings of a radiology report, and the model will generate a summarized impression based on that information.

Out-of-Scope Use

The model should only be used for generating impressions from radiology reports and is not suitable for tasks outside of radiology report summarization.

Recommendations

Users should be aware of the limitations and potential biases of the model when using the generated impressions for clinical decision-making. Specific recommendations require further information.

Training Details

Training Data

The training data was a custom dataset of 70,000 radiology reports. The data was cleaned to remove personal or confidential information, tokenized, and normalized. It was split into a training set of 63,000 radiology reports and a validation set of 7,000 radiology reports.

Training Procedure

The model was trained using the Hugging Face Transformers library: https://huggingface.co/transformers/. It was trained with the AdamW optimizer at a learning rate of 5.6e - 5 for 10 epochs.

Training Hyperparameters

Training regime:
- [evaluation_strategy="epoch"]
- [learning_rate=5.6e-5]
- [per_device_train_batch_size=batch_size //4]
- [per_device_eval_batch_size=batch_size //4]
- [weight_decay=0.01]
- [save_total_limit=3]
- [num_train_epochs=num_train_epochs //4]
- [predict_with_generate=True //4]
- [logging_steps=logging_steps]
- [push_to_hub=False]

Evaluation

Testing Data, Factors & Metrics

Testing Data

The testing data consisted of 10,000 radiology reports.

Factors

The following factors were evaluated:

[-ROUGE-1]
[-ROUGE-2]
[-ROUGE-L]
[-ROUGELSUM]

Metrics

The following metrics were used to evaluate the model:

[ROUGE-1 score: 44.857]
[ROUGE-2 score: 29.015]
[ROUGE-L score: 42.032]
[ROUGELSUM score: 42.038]

Results

The model achieved a ROUGE - L score of 42.032 on the testing data, indicating that it can generate summaries very similar to human - written ones.

Summary

The model was trained on a custom dataset of 70,000 radiology reports and achieved a ROUGE - L score of 42.032 on the testing data, showing its ability to generate summaries similar to human - written ones.

Model Card Authors

Name: Engr. Hamza Iqbal Malik
LinkedIn: www.linkedin.com/in/hamza-iqbal-malik-42366a239
GitHub: https://github.com/hamza4344

Model Card Contact

Name: Engr. Hamza Iqbal Malik
LinkedIn: www.linkedin.com/in/hamza-iqbal-malik-42366a239
GitHub: https://github.com/hamza4344

📄 License

No license information is provided in the original document.

📦 Installation

No installation steps are provided in the original document.

🔧 Technical Details

No in - depth technical details (more than 50 words) are provided in the original document.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご