R

Roberta Base Word Chinese Cluecorpussmall

Developed by uer
A Chinese tokenized version of the RoBERTa medium model pre-trained on CLUECorpusSmall corpus, with tokenization processing to enhance sequence handling efficiency
Downloads 184
Release Time : 3/2/2022

Model Overview

This model is a tokenized version of the RoBERTa pre-trained model for Chinese, offering better performance and faster speed compared to character-level models, suitable for Chinese natural language processing tasks

Model Features

Tokenization Optimization
Utilizes sentencepiece tokenization technology to reduce sequence length and improve processing speed compared to character-level models
Multiple Sizes Available
Offers five different pre-trained model sizes ranging from Tiny to Base
Public Corpus
Trained on the publicly available CLUECorpusSmall corpus, ensuring reproducible results

Model Capabilities

Text Feature Extraction
Masked Language Prediction
Chinese Text Understanding

Use Cases

Text Completion
Transportation Information Completion
Completing queries like 'What time does the [MASK] to Beijing depart?'
Accurately predicts transportation methods such as 'flight' or 'high-speed rail'
Text Feature Extraction
Document Vectorization
Obtaining deep semantic representations of Chinese texts
Can be used for downstream classification, clustering, and other tasks
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase