🚀 From Classical(ancient) Chinese to Modern Chinese
This model translates Classical(ancient) Chinese to Modern Chinese, offering a practical solution for those interested in classical Chinese literature.
🚀 Quick Start
This model has been developed into an application. 【Sui Wuya】 is a classical Chinese reading application powered by Hugging Face Spaces and Streamlit, which contains a vast collection of books and supports translation while reading. You can input Classical Chinese, either punctuated or unpunctuated, and the model will predict the Modern Chinese expression. Other related models include:
This is a translator from Classical Chinese to Modern Chinese. Welcome to my GitHub project page on classical Chinese poetry to discuss and give a star ⭐️.
The training corpus consists of over 900,000 sentence pairs. Link to the dataset 📚. During training, for the source sequence (Classical Chinese sequence), all punctuation marks are removed from the entire sentence with a probability of 50%.
✨ Features
Recommended Inference Channel
⚠️ Important Note
- You must set the
eos_token_id
parameter of the generate
function to 102 to get a complete translated sentence. Otherwise, there may be residual sentences after translation (due to using the pad label = -100 during entropy calculation). Currently, the compute button on the Hugging Face page has this issue. It is recommended to use the following code to get the translation results.
- Please set the
num_beams
parameter of the generate
function to be greater than or equal to 3 to achieve better translation results.
- Please set the
max_length
parameter of the generate
function to 256. Otherwise, the result may cut off the sentence.
from transformers import (
EncoderDecoderModel,
AutoTokenizer
)
PRETRAINED = "raynardj/wenyanwen-ancient-translate-to-modern"
tokenizer = AutoTokenizer.from_pretrained(PRETRAINED)
model = EncoderDecoderModel.from_pretrained(PRETRAINED)
def inference(text):
tk_kwargs = dict(
truncation=True,
max_length=128,
padding="max_length",
return_tensors='pt')
inputs = tokenizer([text,],**tk_kwargs)
with torch.no_grad():
return tokenizer.batch_decode(
model.generate(
inputs.input_ids,
attention_mask=inputs.attention_mask,
num_beams=3,
max_length=256,
bos_token_id=101,
eos_token_id=tokenizer.sep_token_id,
pad_token_id=tokenizer.pad_token_id,
), skip_special_tokens=True)
💻 Usage Examples
Basic Usage
Of course, when using well - known sentences, there are usually some laughable mistakes. If you have any fun cases, please feel free to provide feedback.
>>> inference('非我族类其心必异')
['Not of our clan, their hearts must be different.']
>>> inference('肉食者鄙未能远谋')
['Those who eat meat are vulgar and cannot plan far - reaching.']
>>> inference('江山如此多娇引无数英雄竞折腰惜秦皇汉武略输文采唐宗宋祖稍逊风骚')
['The land is so charming that it attracts countless heroes to bow down. It\'s a pity that Emperor Qin Shi Huang and Emperor Han Wu are slightly lacking in literary grace, and Emperor Tang Zong and Emperor Song Zu are a bit less elegant.']
>>> inference("清风徐来水波不兴")
['A gentle breeze blows slowly, and the water ripples not.']
>>> inference("无他唯手熟尔")
['There is nothing else but being skillful with practice.']
>>> inference("此诚危急存亡之秋也")
['This is truly a critical moment of survival or destruction.']
📚 Documentation
Other Resources for Classical Chinese Poetry