T

T5 Small Chinese Cluecorpussmall

Developed by uer
A small Chinese T5 model pretrained using the UER-py framework, adopting a unified text-to-text format to handle various Chinese NLP tasks
Downloads 1,336
Release Time : 3/2/2022

Model Overview

This model is the small version of the Chinese T5 series, using a unified text-to-text format suitable for multiple Chinese natural language processing tasks. The model was pretrained on the CLUECorpusSmall dataset and supports text generation and conversion through sentinel tokens.

Model Features

Unified Text-to-Text Format
Adopts T5's unified framework to handle various NLP tasks, simplifying the task processing workflow
Sentinel Token Masking
Uses specially formatted sentinel tokens (extraxxx) for text segment masking, enabling flexible text generation
Two-Stage Pretraining
First pretrains with short sequences (128), then fine-tunes with long sequences (512) to enhance model performance

Model Capabilities

Text Generation
Text Conversion
Text Completion
Text Summarization

Use Cases

Text Processing
Text Completion
Uses sentinel tokens to predict and complete missing parts of text
Example shows correct prediction of missing content, e.g., 'The capital of China is extra0京' generates 'extra0 北'
Text Rewriting
Converts input text into output text with different styles or formats
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase