๐ AI-Powered Symptom Checker ๐ฅ๐ค
This model predicts potential medical conditions based on user-reported symptoms. Built using BERT and fine-tuned on the MedText dataset, it helps users get preliminary symptom insights.
๐ Quick Start
You can quickly start using this model with the following code:
from transformers import pipeline
model = pipeline("text-classification", model="Lech-Iyoko/bert-symptom-checker")
result = model("I have a severe headache and nausea.")
print(result)
โจ Features
- Predicts potential medical conditions based on user-reported symptoms.
- Built on BERT and fine - tuned on the MedText dataset.
- Helps users obtain preliminary insights into their symptoms.
๐ฆ Installation
Since this model uses the transformers
library, you can install it via pip:
pip install transformers
๐ป Usage Examples
Basic Usage
from transformers import pipeline
model = pipeline("text-classification", model="Lech-Iyoko/bert-symptom-checker")
result = model("I have a severe headache and nausea.")
print(result)
๐ Documentation
๐ Model Details
Property |
Details |
Model Type |
Text Classification |
Base Model |
BERT (bert-base-uncased ) |
Training Data |
MedText (1.4k medical cases) |
Metrics |
Accuracy: 96.5% , F1-score: 95.1% |
Intended Use |
Assist users in identifying possible conditions based on symptoms |
Limitations |
Not a replacement for professional medical diagnosis |
๐ Limitations & Ethical Considerations
โ ๏ธ Important Note
This model should not be used for medical diagnosis. Always consult a healthcare professional.
๐ Training Hyperparameters
- Preprocessing: Lowercasing, tokenisation, stopword removal
- Training Framework: Hugging Face transformers
- Training Regime: fp32 (full precision training for stability)
- Batch Size: 16
- Learning Rate: 3e - 5
- Epochs: 5
- Optimiser: AdamW
- Scheduler: Linear with warmup
โฑ Speeds, Sizes, Times
- Model Checkpoint Size: 4.5GB
- Training Duration: ~3 - 4 hours on Google Colab
- Throughput: 1200 samples per minute
๐งช Evaluation
Testing Data
- Dataset: MedText (1.4k samples)
- Dataset Type: Medical symptom descriptions โ condition prediction
Splits
- Train: 80% (1,120 cases)
- Test: 20% (280 cases)
Metrics
Metric |
Score |
Accuracy |
96.5% |
F1 - Score |
95.1% |
Precision |
94.7% |
Recall |
95.5% |
๐ Results
The model shows excellent performance in the evaluation:
- Accuracy: 96.5% (measures overall correctness)
- F1 - Score: 95.1% (harmonic mean of precision & recall)
- Precision: 94.7% (correct condition predictions out of all predicted)
- Recall: 95.5% (correct condition predictions out of all actual)
Summary
- Strengths: High recall ensures most conditions are correctly identified.
- Weaknesses: Model might struggle with rare conditions due to dataset limitations.
โ๏ธ Model Architecture & Objective
- Architecture: BERT (
bert-base-uncased
) fine - tuned for medical text classification.
- Objective: Predict potential conditions/outcomes based on patient symptom descriptions.
๐ป Compute Infrastructure
Hardware
- Training: Google Colab (NVIDIA T4 GPU, 16GB RAM)
- Inference: Hugging Face Inference API (optimised for CPU/GPU use)
Software
- Python Version: 3.8
- Deep Learning Framework: PyTorch (transformers library)
- Tokeniser: BERT WordPiece Tokenizer
- Preprocessing Libraries: nltk, spacy, textacy
๐ License
This project is licensed under the Apache 2.0 license.