SmilesTokenizer_PubChem_1M Open-source Model - Empowering Molecular Representation Learning and Chemical Information Processing

Smilestokenizer PubChem 1M

Developed by DeepChem

This model is a RoBERTa model trained on 1 million SMILES from the PubChem 77M dataset, using the Smiles-Tokenizer tool for tokenization, suitable for molecular representation learning and cheminformatics tasks.

Molecular Model

Transformers

#SMILES molecular representation #Cheminformatics #RoBERTa fine-tuning

Downloads 134

Release Time : 3/2/2022

Model Overview

This model is primarily used for molecular representation learning and cheminformatics tasks, capable of converting SMILES strings into meaningful vector representations, applicable in drug discovery, molecular property prediction, and other applications.

Model Features

Based on large-scale chemical dataset

The model is trained on 1 million SMILES from the PubChem 77M dataset, providing broad coverage of chemical structures.

Uses Smiles-Tokenizer

Employs the specialized Smiles-Tokenizer tool for tokenization, optimizing the processing capability of SMILES strings.

RoBERTa architecture

Based on the RoBERTa architecture, it possesses strong sequence modeling and representation learning capabilities.

Model Capabilities

SMILES string encoding

Molecular representation learning

Cheminformatics processing

Use Cases

Drug discovery

Molecular property prediction

Using the model-generated molecular representations to predict physicochemical properties of molecules.

Cheminformatics

Molecular similarity calculation

Calculating molecular similarities based on model-generated molecular representations.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Smilestokenizer PubChem 1M

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 RoBERTa Model for Molecules

🚀 Quick Start

✨ Features

📦 Installation

💻 Usage Examples

📚 Documentation

🔧 Technical Details

📄 License