Bio_ClinicalBERT_DDI_finetuned Open-source Model - Predicting Drug Interaction Probability Based on Chemical Structure

Bio ClinicalBERT DDI Finetuned

Developed by ltmai

A drug interaction prediction model fine-tuned based on Bio_ClinicalBERT, predicting interaction probabilities through the chemical structures of two drugs

Molecular Model

Transformers

#Drug interaction prediction #Clinical chemical structure analysis #Medical text classification

Downloads 62

Release Time : 7/5/2023

Model Overview

This model is specifically designed for predicting drug-drug interactions (DDI). It takes the SMILES chemical structures of two drugs as input and outputs the probability of interaction.

Model Features

Chemical structure analysis

Directly processes drug SMILES chemical structures without manual feature engineering

Strict data isolation

Drugs in the validation and test sets were completely untouched during training, ensuring the ability to predict new drugs

Multi-metric optimization

Simultaneously optimizes key metrics such as F2 score, recall rate, and precision

Model Capabilities

Drug interaction prediction

Chemical structure analysis

Text classification

Use Cases

Pharmaceutical R&D

New drug safety assessment

Predicts potential interactions between newly developed drugs and existing drugs

Achieved a recall rate of 78.49%, effectively identifying potential risks

Clinical medication guidance

Assesses interaction risks when patients use multiple drugs simultaneously

F2 score of 0.7872, balancing false positives and false negatives

🚀 Bio_ClinicalBERT_DDI_finetuned

This model is designed to predict Drug - Drug Interaction (DDI) from the chemical structures of two drugs, offering probabilities of interaction and achieving excellent results on the test dataset.

🚀 Quick Start

This model was initialized from Bio_ClinicalBERT by adding three hidden layers after the BERT pooler layer. The model was trained on the Drug - Drug Interaction dataset extracted from DrugBank database and National Library of Medicine API.

It achieves the following results on the Test dataset:

F2: 0.7872
AUPRC: 0.869
Recall: 0.7849
Precision: 0.7967
MCC: 0.3779

✨ Features

Accurate Prediction: Predict Drug - Drug Interaction (DDI) from the chemical structure of two drugs and return the probability of interaction.
Data Isolation: To avoid data leakage and predict DDI for new drugs, the validation and test sets' drugs are not included in the training set.

📚 Documentation

Model description

Predict Drug - Drug Interaction (DDI) from the Chemical Structure of two drugs. The Model returns the probability of the two drugs having an interaction with each other.

Intended uses & limitations

To construct the input, use the "[SEP]" token to separate between the two drugs. An example of a properly constructed input is as follows:

drug1 = "[Ca++].[O-]C([O-])=O"  #Calcium Carbonate
drug2 = "OC[C@H](O)[C@@H](O)[C@H](O)[C@H](O)CO"   #Sorbitol
correct_input = "[Ca++].[O-]C([O-])=O [SEP] OC[C@H](O)[C@@H](O)[C@H](O)[C@H](O)CO"

Training and evaluation data

To avoid data leakage and be able to predict DDI for new drugs, the drug1 or drug2 in the validation and the test set were not included in the training set. Their SMILES chemical structures were never exposed to the training process.

Training procedure

Using an AWS EC2 g5.4xlarge instance with a 24GB GPU.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.01
train_batch_size: 32
eval_batch_size: 32
seed: 7
optimizer: Adadelta with weight_decay = 1e - 04
lr_scheduler_type: CosineAnnealingLR
num_epochs: 4

Training results

Training Loss	Epoch	Validation Loss	F2	Recall	Precision	Mcc
0.6068	1.0	0.7061	0.6508	0.6444	0.6778	0.2514
0.4529	2.0	0.8334	0.7555	0.7727	0.6939	0.3451
0.3375	3.0	0.9582	0.7636	0.7840	0.6915	0.3474
0.2624	4.0	1.2588	0.7770	0.8004	0.6954	0.3654

Framework versions

Transformers 4.30.2
Pytorch 2.0.1
Datasets 2.13.1
Tokenizers 0.13.3

📄 License

No license information is provided in the original document.

📦 Metadata

Property	Details
Tags	generated_from_trainer, chemistry, medical, drug_drug_interaction
Metrics	f2 - score, recall, precision, mcc
Model Name	Bio_ClinicalBERT_DDI_finetuned
Task	Drug - Drug Interaction Classification (text - classification)
Dataset	DrugBank (REST API)
Recall Value	0.7849
Widget Text	"[Ca++].[O-]C([O-])=O [SEP] OCC@H C@@H C@H C@HCO"
Widget Example Title	"Drug1 [SEP] Drug2"
Pipeline Tag	text - classification

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご