C

Chemberta Zinc Base V1

Developed by seyonec
A Transformer model based on the RoBERTa architecture, specifically designed for masked language modeling tasks with chemical SMILES strings
Downloads 323.83k
Release Time : 3/2/2022

Model Overview

This model learns representations of chemical SMILES strings through pre-training, capable of predicting tokens in molecular sequences and applicable for molecular property prediction and structural analysis

Model Features

Specialized pre-training for chemical SMILES
Pre-trained specifically on the ZINC dataset for chemical SMILES notation
Molecular property prediction capability
Learned representations can be used to predict molecular properties such as toxicity, solubility, and drug-likeness
Attention visualization
Provides attention mechanism visualization tools to help identify important substructures affecting chemical properties
Foundation for transfer learning
Can serve as a feature extractor for graph convolution and attention models, or for BERT fine-tuning

Model Capabilities

Chemical SMILES sequence prediction
Molecular variant generation
Molecular property prediction
Chemical substructure identification

Use Cases

Drug discovery
Molecular property optimization
Predict and optimize properties of drug candidates such as solubility and toxicity
Chemical education
Substructure identification teaching
Use attention visualization to help students understand key structures affecting chemical properties
Material design
Molecular variant generation
Predict reasonable variants of molecules within explorable chemical space
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase