đ Enumeration - Aware Molecular Transformers
A suite of tools for pre - training and fine - tuning SMILES - based molecular language models, with semi - supervised learning recipes for low - data scenarios.
đ Quick Start
This project introduces a set of neural language model tools for pre - training and fine - tuning SMILES - based molecular language models. It also offers semi - supervised learning recipes for fine - tuning these models in low - data settings.
⨠Features
Enumeration - aware Molecular Transformers
It combines contrastive learning, multi - task regression, and masked language modelling as pre - training objectives to incorporate enumeration knowledge into pre - trained language models.
a. Molecular Domain Adaptation (Contrastive Encoder - based)
i. Architecture

ii. Contrastive Learning
b. Canonicalization Encoder - decoder (Denoising Encoder - decoder)
Pretraining steps for this model:
- Pretrain BERT model with Multi task regression on physicochemical properties on Guacamol dataset.
- Domain adaptation on MUV dataset with Constrastive Learning using Siamese BERT architecture.
For more details, please see our [github repository](https://github.com/uds - lsv/enumeration - aware - molecule - transformers).
Virtual Screening Benchmark ([Github Repository](https://github.com/MoleculeTransformers/rdkit - benchmarking - platform - transformers))
The original version was presented in:
S. Riniker, G. Landrum, J. Cheminf., 5, 26 (2013),
DOI: 10.1186/1758 - 2946 - 5 - 26,
URL: http://www.jcheminf.com/content/5/1/26
The extended version was presented in:
S. Riniker, N. Fechner, G. Landrum, J. Chem. Inf. Model., 53, 2829, (2013),
DOI: 10.1021/ci400466r,
URL: http://pubs.acs.org/doi/abs/10.1021/ci400466r
đ Documentation
Model List
Our released models are listed as follows. You can import these models by using the smiles - featurizers
package or using HuggingFace's Transformers.
Property |
Details |
Model Type |
Bert, Bart, Simcse, SentenceTransformer |
Training Data |
jxie/guacamol, AdrianM0/MUV |
Model |
Type |
AUROC |
BEDROC |
[UdS - LSV/smole - bert](https://huggingface.co/UdS - LSV/smole - bert) |
Bert |
0.615 |
0.225 |
[UdS - LSV/smole - bert - mtr](https://huggingface.co/UdS - LSV/smole - bert - mtr) |
Bert |
0.621 |
0.262 |
[UdS - LSV/smole - bart](https://huggingface.co/UdS - LSV/smole - bart) |
Bart |
0.660 |
0.263 |
[UdS - LSV/muv2x - simcse - smole - bart](https://huggingface.co/UdS - LSV/muv2x - simcse - smole - bert) |
Simcse |
0.697 |
0.270 |
[UdS - LSV/siamese - smole - bert - muv - 1x](https://huggingface.co/UdS - LSV/siamese - smole - bert - muv - 1x) |
SentenceTransformer |
0.673 |
0.274 |
đ License
This project is licensed under the Apache - 2.0 license.