S

Smilestokenizer PubChem 1M

Developed by DeepChem
This model is a RoBERTa model trained on 1 million SMILES from the PubChem 77M dataset, using the Smiles-Tokenizer tool for tokenization, suitable for molecular representation learning and cheminformatics tasks.
Downloads 134
Release Time : 3/2/2022

Model Overview

This model is primarily used for molecular representation learning and cheminformatics tasks, capable of converting SMILES strings into meaningful vector representations, applicable in drug discovery, molecular property prediction, and other applications.

Model Features

Based on large-scale chemical dataset
The model is trained on 1 million SMILES from the PubChem 77M dataset, providing broad coverage of chemical structures.
Uses Smiles-Tokenizer
Employs the specialized Smiles-Tokenizer tool for tokenization, optimizing the processing capability of SMILES strings.
RoBERTa architecture
Based on the RoBERTa architecture, it possesses strong sequence modeling and representation learning capabilities.

Model Capabilities

SMILES string encoding
Molecular representation learning
Cheminformatics processing

Use Cases

Drug discovery
Molecular property prediction
Using the model-generated molecular representations to predict physicochemical properties of molecules.
Cheminformatics
Molecular similarity calculation
Calculating molecular similarities based on model-generated molecular representations.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase