R

Roberta Classical Chinese Large Sentence Segmentation

Developed by KoichiYasuoka
A RoBERTa model pre-trained on classical Chinese texts, specifically designed for sentence segmentation tasks in classical Chinese.
Downloads 20
Release Time : 3/2/2022

Model Overview

This model is used to segment continuous classical Chinese texts into complete sentences, with each sentence starting with the token class 'B' and ending with 'E' (single-character sentences are marked as 'S').

Model Features

Specialized for Classical Chinese
Optimized specifically for classical Chinese texts, effectively handling the unique grammatical structures and expressions of ancient Chinese.
Accurate Sentence Segmentation
Uses a B/E/S tagging system to accurately identify sentence boundaries in classical Chinese.
Based on RoBERTa Architecture
Leverages the powerful RoBERTa pre-trained model, fine-tuned on classical Chinese texts.

Model Capabilities

Classical Chinese processing
Sentence boundary recognition
Text segmentation

Use Cases

Ancient text digitization
Automatic segmentation of ancient texts
Automatically segments unsegmented ancient literature into complete sentences
Improves the efficiency and accuracy of ancient text digitization
Academic research
Construction of classical Chinese corpora
Provides linguists with pre-processed segmented texts
Facilitates subsequent lexical analysis and grammatical research
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase