Roberta Classical Chinese Large Sentence Segmentation
A RoBERTa model pre-trained on classical Chinese texts, specifically designed for sentence segmentation tasks in classical Chinese.
Downloads 20
Release Time : 3/2/2022
Model Overview
This model is used to segment continuous classical Chinese texts into complete sentences, with each sentence starting with the token class 'B' and ending with 'E' (single-character sentences are marked as 'S').
Model Features
Specialized for Classical Chinese
Optimized specifically for classical Chinese texts, effectively handling the unique grammatical structures and expressions of ancient Chinese.
Accurate Sentence Segmentation
Uses a B/E/S tagging system to accurately identify sentence boundaries in classical Chinese.
Based on RoBERTa Architecture
Leverages the powerful RoBERTa pre-trained model, fine-tuned on classical Chinese texts.
Model Capabilities
Classical Chinese processing
Sentence boundary recognition
Text segmentation
Use Cases
Ancient text digitization
Automatic segmentation of ancient texts
Automatically segments unsegmented ancient literature into complete sentences
Improves the efficiency and accuracy of ancient text digitization
Academic research
Construction of classical Chinese corpora
Provides linguists with pre-processed segmented texts
Facilitates subsequent lexical analysis and grammatical research
Featured Recommended AI Models