đ DrugGPT
A generative drug design model based on GPT2, aiming to revolutionize drug design using natural language processing techniques.
đ Quick Start
DrugGPT is a generative pharmaceutical strategy based on the GPT structure. It applies the GPT model to explore the chemical space and discover new molecules with potential binding abilities for specific proteins. By training on up to 1.8 million protein - ligand binding data, it provides a fast and efficient method for generating drug candidate molecules.
⨠Features
- Innovative Approach: Utilizes natural language processing in drug design.
- Data - Driven: Trained on a large dataset of protein - ligand binding data.
- Flexible Usage: Allows input in different formats such as amino acid sequences and FASTA files.
đĻ Installation
- Clone the Repository
git clone https://github.com/LIYUESEN/druggpt.git
cd druggpt
Or you can visit our GitHub repo and click Code>Download ZIP to download this repo.
- Create a Virtual Environment
conda create -n druggpt python=3.7
conda activate druggpt
- Download Python Dependencies
pip install datasets transformers scipy scikit - learn
pip install torch torchvision torchaudio --index - url https://download.pytorch.org/whl/cu117
conda install -c openbabel openbabel
đģ Usage Examples
Basic Usage
Use drug_generator.py with the following required parameters:
-p
| --pro_seq
: Input a protein amino acid sequence.
-f
| --fasta
: Input a FASTA file.
Only one of -p and -f should be specified.
-l
| --ligand_prompt
: Input a ligand prompt.
-e
| --empty_input
: Enable directly generate mode.
-n
| --number
: At least how many molecules will be generated.
-d
| --device
: Hardware device to use. Default is 'cuda'.
-o
| --output
: Output directory for generated molecules. Default is './ligand_output/'.
-b
| --batch_size
: How many molecules will be generated per batch. Try to reduce this value if you have low RAM. Default is 32.
Specific Examples
- Input a Protein FASTA File
python drug_generator.py -f bcl2.fasta -n 50
- Input the Amino Acid Sequence of the Protein
python drug_generator.py -p MAKQPSDVSSECDREGRQLQPAERPPQLRPGAPTSLQTEPQGNPEGNHGGEGDSCPHGSPQGPLAPPASPGPFATRSPLFIFMRRSSLLSRSSSGYFSFDTDRSPAPMSCDKSTQTPSPPCQAFNHYLSAMASMRQAEPADMRPEIWIAQELRRIGDEFNAYYARRVFLNNYQAAEDHPRMVILRLLRYIVRLVWRMH -n 50
- Provide a Prompt for the Ligand
python drug_generator.py -f bcl2.fasta -l COc1ccc(cc1)C(=O) -n 50
- Note for Linux Environment
If you are running in a Linux environment, you need to enclose the ligand's prompt with single quotes ('').
python drug_generator.py -f bcl2.fasta -l 'COc1ccc(cc1)C(=O)' -n 50
đ Documentation
How to Reference this Work
DrugGPT: A GPT - based Strategy for Designing Potential Ligands Targeting Specific Proteins
Yuesen Li, Chengyi Gao, Xin Song, Xiangyu Wang, Yungang Xu, Suxia Han
bioRxiv 2023.06.29.543848; doi: https://doi.org/10.1101/2023.06.29.543848

đ License
GNU General Public License v3.0