BERT-Symptom Checker Open-Source Model - A Practical Tool for Predicting Potential Medical Conditions Based on Symptoms

Bert Symptom Checker

Developed by Lech-Iyoko

A text classification model built on BERT and fine-tuned on the MedText dataset, used to predict potential medical conditions based on user-reported symptoms.

Text Classification

Transformers

EnglishOpen Source License:Apache-2.0 #High-precision symptom analysis #Medical BERT fine-tuning #Disease prediction

Downloads 60

Release Time : 3/11/2025

Model Overview

This model helps users obtain preliminary medical condition analysis based on symptom descriptions, but it cannot replace professional medical diagnosis.

Model Features

High accuracy

Achieves 96.5% accuracy and 95.1% F1 score on the test set.

Medical specialization

Fine-tuned on the MedText medical dataset, specifically designed for symptom analysis.

Easy to use

Provides a simple Hugging Face pipeline interface that can be run with just a few lines of code.

Model Capabilities

Symptom analysis

Disease prediction

Medical text classification

Use Cases

Healthcare

Preliminary symptom analysis

Users input symptom descriptions, and the model returns possible disease predictions

Helps users understand potential medical conditions, but requires confirmation by a professional doctor

Health app integration

Integrated into health apps to provide symptom checking functionality

Enhances the medical assistance features of the app

🚀 AI-Powered Symptom Checker 🏥🤖

This model predicts potential medical conditions based on user-reported symptoms. Built using BERT and fine-tuned on the MedText dataset, it helps users get preliminary symptom insights.

🚀 Quick Start

You can quickly start using this model with the following code:

from transformers import pipeline

model = pipeline("text-classification", model="Lech-Iyoko/bert-symptom-checker")
result = model("I have a severe headache and nausea.")
print(result)

✨ Features

Predicts potential medical conditions based on user-reported symptoms.
Built on BERT and fine - tuned on the MedText dataset.
Helps users obtain preliminary insights into their symptoms.

📦 Installation

Since this model uses the transformers library, you can install it via pip:

pip install transformers

💻 Usage Examples

Basic Usage

from transformers import pipeline

model = pipeline("text-classification", model="Lech-Iyoko/bert-symptom-checker")
result = model("I have a severe headache and nausea.")
print(result)

📚 Documentation

🔍 Model Details

Property	Details
Model Type	Text Classification
Base Model	BERT (`bert-base-uncased`)
Training Data	MedText (1.4k medical cases)
Metrics	Accuracy: `96.5%`, F1-score: `95.1%`
Intended Use	Assist users in identifying possible conditions based on symptoms
Limitations	Not a replacement for professional medical diagnosis

📌 Limitations & Ethical Considerations

⚠️ Important Note

This model should not be used for medical diagnosis. Always consult a healthcare professional.

📝 Training Hyperparameters

Preprocessing: Lowercasing, tokenisation, stopword removal
Training Framework: Hugging Face transformers
Training Regime: fp32 (full precision training for stability)
Batch Size: 16
Learning Rate: 3e - 5
Epochs: 5
Optimiser: AdamW
Scheduler: Linear with warmup

⏱ Speeds, Sizes, Times

Model Checkpoint Size: 4.5GB
Training Duration: ~3 - 4 hours on Google Colab
Throughput: 1200 samples per minute

🧪 Evaluation

Testing Data

Dataset: MedText (1.4k samples)
Dataset Type: Medical symptom descriptions → condition prediction

Splits

Train: 80% (1,120 cases)
Test: 20% (280 cases)

Metrics

Metric	Score
Accuracy	96.5%
F1 - Score	95.1%
Precision	94.7%
Recall	95.5%

📊 Results

The model shows excellent performance in the evaluation:

Accuracy: 96.5% (measures overall correctness)
F1 - Score: 95.1% (harmonic mean of precision & recall)
Precision: 94.7% (correct condition predictions out of all predicted)
Recall: 95.5% (correct condition predictions out of all actual)

Summary

Strengths: High recall ensures most conditions are correctly identified.
Weaknesses: Model might struggle with rare conditions due to dataset limitations.

⚙️ Model Architecture & Objective

Architecture: BERT (bert-base-uncased) fine - tuned for medical text classification.
Objective: Predict potential conditions/outcomes based on patient symptom descriptions.

💻 Compute Infrastructure

Hardware

Training: Google Colab (NVIDIA T4 GPU, 16GB RAM)
Inference: Hugging Face Inference API (optimised for CPU/GPU use)

Software

Python Version: 3.8
Deep Learning Framework: PyTorch (transformers library)
Tokeniser: BERT WordPiece Tokenizer
Preprocessing Libraries: nltk, spacy, textacy

📄 License

This project is licensed under the Apache 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご