đ Bio_ClinicalBERT_DDI_finetuned
This model is designed to predict Drug - Drug Interaction (DDI) from the chemical structures of two drugs, offering probabilities of interaction and achieving excellent results on the test dataset.
đ Quick Start
This model was initialized from Bio_ClinicalBERT by adding three hidden layers after the BERT pooler layer. The model was trained on the Drug - Drug Interaction dataset extracted from DrugBank database and National Library of Medicine API.
It achieves the following results on the Test dataset:
- F2: 0.7872
- AUPRC: 0.869
- Recall: 0.7849
- Precision: 0.7967
- MCC: 0.3779
⨠Features
- Accurate Prediction: Predict Drug - Drug Interaction (DDI) from the chemical structure of two drugs and return the probability of interaction.
- Data Isolation: To avoid data leakage and predict DDI for new drugs, the validation and test sets' drugs are not included in the training set.
đ Documentation
Model description
Predict Drug - Drug Interaction (DDI) from the Chemical Structure of two drugs. The Model returns the probability of the two drugs having an interaction with each other.
Intended uses & limitations
To construct the input, use the "[SEP]" token to separate between the two drugs. An example of a properly constructed input is as follows:
drug1 = "[Ca++].[O-]C([O-])=O"
drug2 = "OC[C@H](O)[C@@H](O)[C@H](O)[C@H](O)CO"
correct_input = "[Ca++].[O-]C([O-])=O [SEP] OC[C@H](O)[C@@H](O)[C@H](O)[C@H](O)CO"
Training and evaluation data
To avoid data leakage and be able to predict DDI for new drugs, the drug1 or drug2 in the validation and the test set were not included in the training set. Their SMILES chemical structures were never exposed to the training process.
Training procedure
Using an AWS EC2 g5.4xlarge instance with a 24GB GPU.
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.01
- train_batch_size: 32
- eval_batch_size: 32
- seed: 7
- optimizer: Adadelta with weight_decay = 1e - 04
- lr_scheduler_type: CosineAnnealingLR
- num_epochs: 4
Training results
Training Loss |
Epoch |
Validation Loss |
F2 |
Recall |
Precision |
Mcc |
0.6068 |
1.0 |
0.7061 |
0.6508 |
0.6444 |
0.6778 |
0.2514 |
0.4529 |
2.0 |
0.8334 |
0.7555 |
0.7727 |
0.6939 |
0.3451 |
0.3375 |
3.0 |
0.9582 |
0.7636 |
0.7840 |
0.6915 |
0.3474 |
0.2624 |
4.0 |
1.2588 |
0.7770 |
0.8004 |
0.6954 |
0.3654 |
Framework versions
- Transformers 4.30.2
- Pytorch 2.0.1
- Datasets 2.13.1
- Tokenizers 0.13.3
đ License
No license information is provided in the original document.
đĻ Metadata
Property |
Details |
Tags |
generated_from_trainer, chemistry, medical, drug_drug_interaction |
Metrics |
f2 - score, recall, precision, mcc |
Model Name |
Bio_ClinicalBERT_DDI_finetuned |
Task |
Drug - Drug Interaction Classification (text - classification) |
Dataset |
DrugBank (REST API) |
Recall Value |
0.7849 |
Widget Text |
"[Ca++].[O-]C([O-])=O [SEP] OCC@HC@@HC@HC@HCO" |
Widget Example Title |
"Drug1 [SEP] Drug2" |
Pipeline Tag |
text - classification |