🚀 CodonTransformer
CodonTransformer is a powerful tool for codon optimization. It can convert protein sequences into optimized DNA sequences tailored to specific target organisms. Ideal for genetic engineering researchers and practitioners, it simplifies codon optimization with its Transformer architecture and user - friendly Jupyter notebook, saving time and effort.
🚀 Quick Start
Installation
The installation details are not provided in the original document. If you want to use CodonTransformer, you can check the PyPI Package for installation instructions.
Usage
For an interactive demo, check out our Google Colab Notebook.
After installing CodonTransformer, you can use the following code:
import torch
from transformers import AutoTokenizer, BigBirdForMaskedLM
from CodonTransformer.CodonPrediction import predict_dna_sequence
from CodonTransformer.CodonJupyter import format_model_output
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained("adibvafa/CodonTransformer")
model = BigBirdForMaskedLM.from_pretrained("adibvafa/CodonTransformer").to(device)
protein = "MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGG"
organism = "Escherichia coli general"
output = predict_dna_sequence(
protein=protein,
organism=organism,
device=device,
tokenizer=tokenizer,
model=model,
attention_type="original_full",
deterministic=True
)
print(format_model_output(output))
The output is:
-----------------------------
| Organism |
-----------------------------
Escherichia coli general
-----------------------------
| Input Protein |
-----------------------------
MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGG
-----------------------------
| Processed Input |
-----------------------------
M_UNK A_UNK L_UNK W_UNK M_UNK R_UNK L_UNK L_UNK P_UNK L_UNK L_UNK A_UNK L_UNK L_UNK A_UNK L_UNK W_UNK G_UNK P_UNK D_UNK P_UNK A_UNK A_UNK A_UNK F_UNK V_UNK N_UNK Q_UNK H_UNK L_UNK C_UNK G_UNK S_UNK H_UNK L_UNK V_UNK E_UNK A_UNK L_UNK Y_UNK L_UNK V_UNK C_UNK G_UNK E_UNK R_UNK G_UNK F_UNK F_UNK Y_UNK T_UNK P_UNK K_UNK T_UNK R_UNK R_UNK E_UNK A_UNK E_UNK D_UNK L_UNK Q_UNK V_UNK G_UNK Q_UNK V_UNK E_UNK L_UNK G_UNK G_UNK __UNK
-----------------------------
| Predicted DNA |
-----------------------------
ATGGCTTTATGGATGCGTCTGCTGCCGCTGCTGGCGCTGCTGGCGCTGTGGGGCCCGGACCCGGCGGCGGCGTTTGTGAATCAGCACCTGTGCGGCAGCCACCTGGTGGAAGCGCTGTATCTGGTGTGCGGTGAGCGCGGCTTCTTCTACACGCCCAAAACCCGCCGCGAAGCGGAAGATCTGCAGGTGGGCCAGGTGGAGCTGGGCGGCTAA
✨ Features
- Codon Optimization: Convert protein sequences into optimized DNA sequences for specific target organisms.
- User - Friendly: Utilize a user - friendly Jupyter notebook and the Transformer architecture to simplify the codon optimization process.
📚 Documentation
Additional Resources
👥 Authors
Adibvafa Fallahpour1,2*, Vincent Gureghian3*, Guillaume J. Filion2‡, Ariel B. Lindner3‡, Amir Pandi3‡
1 Vector Institute for Artificial Intelligence, Toronto ON, Canada
2 University of Toronto Scarborough; Department of Biological Science; Scarborough ON, Canada
3 Université Paris Cité, INSERM U1284, Center for Research and Interdisciplinarity, F - 75006 Paris, France
* These authors contributed equally to this work.
‡ To whom correspondence should be addressed:
guillaume.filion@utoronto.ca, ariel.lindner@inserm.fr, amir.pandi@cri - paris.org
📄 License
This project is licensed under the Apache 2.0 License.
📖 Citation
@article{Fallahpour_Gureghian_Filion_Lindner_Pandi_2025,
title={CodonTransformer: a multispecies codon optimizer using context-aware neural networks},
volume={16},
ISSN={2041-1723},
url={https://www.nature.com/articles/s41467-025-58588-7},
DOI={10.1038/s41467-025-58588-7},
number={1},
journal={Nature Communications},
author={Fallahpour, Adibvafa and Gureghian, Vincent and Filion, Guillaume J. and Lindner, Ariel B. and Pandi, Amir},
year={2025},
month=apr,
pages={3205},
language={en}
}