G

Guwenbert Large

Developed by ethanyt
A RoBERTa model pre-trained on classical Chinese, suitable for ancient text processing tasks
Downloads 217
Release Time : 3/2/2022

Model Overview

This is a RoBERTa model specifically pre-trained for classical Chinese, applicable to downstream tasks such as sentence segmentation, punctuation, and named entity recognition in ancient texts.

Model Features

Specialized Pre-training for Classical Chinese
Specifically pre-trained on classical Chinese to better understand its semantics and grammatical structures
Two-stage Training Strategy
Adopts a two-stage strategy of first training the embedding layer and then training all parameters to improve training effectiveness
Large-scale Training Data
Uses the Daizhigeke ancient literature dataset, containing 15,694 classical Chinese books with 1.7 billion characters

Model Capabilities

Classical Chinese Semantic Understanding
Classical Chinese Masked Language Prediction
Classical Chinese Sentence Segmentation
Classical Chinese Punctuation
Classical Chinese Named Entity Recognition

Use Cases

Ancient Book Processing
Ancient Book Named Entity Recognition
Identify entities such as book titles and person names in ancient books
Achieved second place in the 'Gulian Cup' ancient book named entity recognition evaluation with an F1 score of 84.63
Classical Chinese Sentence Segmentation and Punctuation
Automatically add punctuation to unpunctuated classical Chinese texts
Featured Recommended AI Models
ยฉ 2025AIbase