R

Roberta Tiny Word Chinese Cluecorpussmall

Developed by uer
A Chinese word-based RoBERTa medium model pre-trained on CLUECorpusSmall, featuring an 8-layer 512-hidden architecture with superior performance and faster processing speed compared to character-based models
Downloads 17
Release Time : 3/2/2022

Model Overview

Chinese word-based RoBERTa pre-trained language model supporting masked prediction and text feature extraction tasks, suitable for various Chinese natural language processing applications

Model Features

Word-based Advantage
Utilizes word tokenization instead of character-level processing, significantly reducing sequence length and improving processing speed, with proven superior performance over character-based models
Multiple Specifications
Offers 5 specifications from Tiny(L2/H128) to Base(L12/H768) to meet different computational resource requirements
Open-source Training
Trained on publicly available CLUECorpusSmall using sentencepiece tokenizer, with complete training details provided for reproducibility

Model Capabilities

Chinese text masked prediction
Text feature vector extraction
Downstream task fine-tuning

Use Cases

Text Completion
Transportation Information Query
Complete transportation schedule query sentences
Example input: 'The next [MASK] to Beijing departs at what time?'
Intelligent Q&A
Fact-based Question Answering
Answer common sense questions
Example input: '[MASK]'s capital is Beijing.'
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase