Biobert-ner-disease-ncbi Open-source Model - Efficiently Extract Disease Information and Accurately Complete Disease Extraction Tasks

Biobert Ner Disease Ncbi

Developed by Ishan0612

A disease named entity recognition model fine-tuned on BioBERT, trained using the NCBI disease dataset, and excels in disease extraction tasks.

Sequence Labeling

Transformers

EnglishOpen Source License:Apache-2.0 #High-precision Disease Recognition #Biomedical Text Mining #Clinical Document Analysis

Downloads 21

Release Time : 4/26/2025

Model Overview

This model is optimized for accurately identifying diseases, symptoms, and medical conditions from clinical and biomedical texts, achieving high accuracy and F1 scores.

Model Features

High-precision Recognition

Achieves 98.64% accuracy and 89.04% F1 score in disease extraction tasks.

Domain-specific Optimization

Designed specifically for clinical and biomedical texts to accurately identify diseases, symptoms, and medical conditions.

Authoritative Dataset Training

Trained on the NCBI disease dataset, which includes 793 PubMed abstracts and 6892 disease mentions.

Model Capabilities

Disease Named Entity Recognition

Biomedical Text Analysis

Clinical Document Processing

Use Cases

Medical AI

Clinical Document Analysis

Automatically identifies diseases and symptoms from clinical documents.

High-precision recognition of disease mentions.

Medical Research Automation

Supports disease data extraction and analysis in medical research.

Improves research efficiency and accuracy.

🚀 BioBERT Disease NER Model

One of the strongest and most accurate disease NER models, fine - tuned on BioBERT using the NCBI Disease dataset, delivering high - performance disease extraction.

🚀 Quick Start

This is an outstanding disease NER model, fine - tuned on BioBERT with the reliable NCBI Disease dataset. It attains an excellent 98.64% accuracy and an impressive F1 - score of 89.04%, offering high - performance for disease extraction tasks. It is optimized for precisely identifying diseases, symptoms, and medical conditions from clinical and biomedical texts.

✨ Features

High - Performance Metrics: Achieves remarkable precision, recall, F1 - score, and accuracy.
Fine - Tuned: Fine - tuned over 6,800+ annotated examples for 5 epochs, ensuring consistently high validation scores.
Intended Use: Extracts disease mentions from clinical and biomedical documents and supports healthcare AI systems and medical research automation.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

You can use this model with the Hugging Face Transformers library:

Note: LABEL_0 corresponds to "O" (Outside), LABEL_1 to "B - Disease", and LABEL_2 to "I - Disease" following the BIO tagging format.

from transformers import pipeline

nlp = pipeline(
    "ner",
    model="Ishan0612/biobert-ner-disease-ncbi",
    tokenizer="Ishan0612/biobert-ner-disease-ncbi",
    aggregation_strategy="simple"
)

text = "The patient has signs of diabetes mellitus and chronic obstructive pulmonary disease."

results = nlp(text)

for entity in results:
    print(f"{entity['word']} - ({entity['entity_group']})")

This should output:

Extracted Medical Entities:

the patient has signs of - (LABEL_0)

diabetes - (LABEL_1)

mellitus - (LABEL_2)

and - (LABEL_0)

chronic - (LABEL_1)

obstructive pulmonary disease - (LABEL_2)

. - (LABEL_0)

📚 Documentation

Model Performance

Precision: 86.80%
Recall: 91.39%
F1 - Score: 89.04%
Accuracy: 98.64%

Intended Use

Extract disease mentions from clinical and biomedical documents.
Support healthcare AI systems and medical research automation.

Training Data

This model was trained on the NCBI disease dataset, which consists of 793 PubMed abstracts with 6892 disease mentions.

Technical Details

The model is fine - tuned on the BioBERT base model (dmis - lab/biobert - base - cased - v1.1). It uses the BIO tagging format where LABEL_0 corresponds to "O" (Outside), LABEL_1 to "B - Disease", and LABEL_2 to "I - Disease".

📄 License

This model is licensed under the Apache 2.0 License, same as the original BioBERT (dmis - lab/biobert - base - cased - v1.1).

🔧 Technical Details

This model is a token - classification model fine - tuned on the dmis - lab/biobert - base - cased - v1.1 base model. It is trained on the NCBI disease dataset for 5 epochs over 6,800+ annotated examples. The model uses the BIO tagging format for named - entity recognition.

📄 License

This model is licensed under the Apache 2.0 License, same as the original BioBERT (dmis - lab/biobert - base - cased - v1.1).

📚 Documentation

Citation

@article{lee2020biobert, title={BioBERT: a pre - trained biomedical language representation model for biomedical text mining}, author={Lee, Jinhyuk and Yoon, Wonjin and Kim, Sungdong and Kim, Donghyeon and So, Chan Ho and Kang, Jaewoo}, journal={Bioinformatics}, volume={36}, number={4}, pages={1234--1240}, year={2020}, publisher={Oxford University Press} }

Property	Details
Model Type	Token - Classification
Training Data	NCBI disease dataset (793 PubMed abstracts with 6892 disease mentions)
Base Model	dmis - lab/biobert - base - cased - v1.1
License	Apache 2.0 License
Metrics	F1, Precision, Recall, Accuracy

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご