Smilestokenizer PubChem 1M
S
Smilestokenizer PubChem 1M
Developed by DeepChem
This model is a RoBERTa model trained on 1 million SMILES from the PubChem 77M dataset, using the Smiles-Tokenizer tool for tokenization, suitable for molecular representation learning and cheminformatics tasks.
Downloads 134
Release Time : 3/2/2022
Model Overview
This model is primarily used for molecular representation learning and cheminformatics tasks, capable of converting SMILES strings into meaningful vector representations, applicable in drug discovery, molecular property prediction, and other applications.
Model Features
Based on large-scale chemical dataset
The model is trained on 1 million SMILES from the PubChem 77M dataset, providing broad coverage of chemical structures.
Uses Smiles-Tokenizer
Employs the specialized Smiles-Tokenizer tool for tokenization, optimizing the processing capability of SMILES strings.
RoBERTa architecture
Based on the RoBERTa architecture, it possesses strong sequence modeling and representation learning capabilities.
Model Capabilities
SMILES string encoding
Molecular representation learning
Cheminformatics processing
Use Cases
Drug discovery
Molecular property prediction
Using the model-generated molecular representations to predict physicochemical properties of molecules.
Cheminformatics
Molecular similarity calculation
Calculating molecular similarities based on model-generated molecular representations.
Featured Recommended AI Models
Š 2025AIbase