Siamese-smole-bert-muv-1x Open-source Molecular Model - Supports Pretraining, Fine-tuning and Semi-supervised Learning

Siamese Smole Bert Muv 1x

Developed by UdS-LSV

A neural language model toolkit for pretraining and fine-tuning SMILES-based molecular language models, supporting semi-supervised learning

Molecular Model

Transformers

Open Source License:Apache-2.0 #Molecular Property Prediction #Contrastive Learning Pretraining #SMILES Encoding

Downloads 33

Release Time : 8/17/2022

Model Overview

This model incorporates enumeration knowledge into pretrained language models through contrastive learning, multi-task regression, and masked language modeling as pretraining objectives, suitable for virtual screening and property prediction in the molecular domain.

Model Features

Enumeration-Aware Pretraining

Injects molecular enumeration knowledge into the pretraining process via contrastive learning and multi-task regression

Domain Adaptation Capability

Uses a Siamese BERT architecture for contrastive learning adaptation in the molecular domain

Semi-Supervised Learning Support

Provides semi-supervised fine-tuning solutions for low-data environments

Model Capabilities

Molecular Property Prediction

Virtual Screening

Molecular Representation Learning

Semi-Supervised Learning

Use Cases

Drug Discovery

Virtual Screening

Uses the model to predict molecular activity and screen potential drug candidates

Achieved 0.697 AUROC on the MUV dataset

Molecular Property Prediction

Predicts physicochemical properties of molecules

Pretrained on the Guacamol dataset

🚀 Enumeration - Aware Molecular Transformers

A suite of tools for pre - training and fine - tuning SMILES - based molecular language models, with semi - supervised learning recipes for low - data scenarios.

🚀 Quick Start

This project introduces a set of neural language model tools for pre - training and fine - tuning SMILES - based molecular language models. It also offers semi - supervised learning recipes for fine - tuning these models in low - data settings.

✨ Features

Enumeration - aware Molecular Transformers

It combines contrastive learning, multi - task regression, and masked language modelling as pre - training objectives to incorporate enumeration knowledge into pre - trained language models.

a. Molecular Domain Adaptation (Contrastive Encoder - based)

i. Architecture

smole bert drawio

ii. Contrastive Learning

b. Canonicalization Encoder - decoder (Denoising Encoder - decoder)

Pretraining steps for this model:

Pretrain BERT model with Multi task regression on physicochemical properties on Guacamol dataset.
Domain adaptation on MUV dataset with Constrastive Learning using Siamese BERT architecture.

For more details, please see our [github repository](https://github.com/uds - lsv/enumeration - aware - molecule - transformers).

Virtual Screening Benchmark ([Github Repository](https://github.com/MoleculeTransformers/rdkit - benchmarking - platform - transformers))

The original version was presented in: S. Riniker, G. Landrum, J. Cheminf., 5, 26 (2013), DOI: 10.1186/1758 - 2946 - 5 - 26, URL: http://www.jcheminf.com/content/5/1/26

The extended version was presented in: S. Riniker, N. Fechner, G. Landrum, J. Chem. Inf. Model., 53, 2829, (2013), DOI: 10.1021/ci400466r, URL: http://pubs.acs.org/doi/abs/10.1021/ci400466r

📚 Documentation

Model List

Our released models are listed as follows. You can import these models by using the smiles - featurizers package or using HuggingFace's Transformers.

Property	Details
Model Type	Bert, Bart, Simcse, SentenceTransformer
Training Data	jxie/guacamol, AdrianM0/MUV

Model	Type	AUROC	BEDROC
[UdS - LSV/smole - bert](https://huggingface.co/UdS - LSV/smole - bert)	`Bert`	0.615	0.225
[UdS - LSV/smole - bert - mtr](https://huggingface.co/UdS - LSV/smole - bert - mtr)	`Bert`	0.621	0.262
[UdS - LSV/smole - bart](https://huggingface.co/UdS - LSV/smole - bart)	`Bart`	0.660	0.263
[UdS - LSV/muv2x - simcse - smole - bart](https://huggingface.co/UdS - LSV/muv2x - simcse - smole - bert)	`Simcse`	0.697	0.270
[UdS - LSV/siamese - smole - bert - muv - 1x](https://huggingface.co/UdS - LSV/siamese - smole - bert - muv - 1x)	`SentenceTransformer`	0.673	0.274

📄 License

This project is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご