Chemberta Zinc Base V1
C
Chemberta Zinc Base V1
Developed by seyonec
A Transformer model based on the RoBERTa architecture, specifically designed for masked language modeling tasks with chemical SMILES strings
Downloads 323.83k
Release Time : 3/2/2022
Model Overview
This model learns representations of chemical SMILES strings through pre-training, capable of predicting tokens in molecular sequences and applicable for molecular property prediction and structural analysis
Model Features
Specialized pre-training for chemical SMILES
Pre-trained specifically on the ZINC dataset for chemical SMILES notation
Molecular property prediction capability
Learned representations can be used to predict molecular properties such as toxicity, solubility, and drug-likeness
Attention visualization
Provides attention mechanism visualization tools to help identify important substructures affecting chemical properties
Foundation for transfer learning
Can serve as a feature extractor for graph convolution and attention models, or for BERT fine-tuning
Model Capabilities
Chemical SMILES sequence prediction
Molecular variant generation
Molecular property prediction
Chemical substructure identification
Use Cases
Drug discovery
Molecular property optimization
Predict and optimize properties of drug candidates such as solubility and toxicity
Chemical education
Substructure identification teaching
Use attention visualization to help students understand key structures affecting chemical properties
Material design
Molecular variant generation
Predict reasonable variants of molecules within explorable chemical space
Featured Recommended AI Models