wenyanwen-ancient-translate-to-modern: An Open-source Classical Chinese Translation Model - Free Support for Translating Both Punctuated and Unpunctuated Texts

Wenyanwen Ancient Translate To Modern

Developed by raynardj

This model is used to translate Classical Chinese (Ancient Text) into Modern Chinese, supporting both punctuated and unpunctuated text input.

Machine Translation

Transformers

Chinese#Classical Chinese translation #Ancient text reading assistance #Unpunctuated text processing

Downloads 186

Release Time : 3/2/2022

Model Overview

A specialized model for translating ancient Classical Chinese into Modern Chinese, suitable for classical text reading and learning.

Model Features

Supports unpunctuated input

The model can process Classical Chinese input with or without punctuation, improving usability.

Large-scale training data

Training corpus contains over 900,000 sentence pairs, covering a wide range of Classical Chinese expressions.

Integrated application support

Developed into an application called 【随无涯】, available on HuggingFace Spaces.

Model Capabilities

Classical Chinese to Modern Chinese translation

Processing unpunctuated text

Generating fluent Modern Chinese expressions

Use Cases

Education and learning

Classical text reading assistance

Helping learners understand Classical Chinese masterpieces

Providing accurate Modern Chinese translations

Ancient text digitization

Converting ancient literature into Modern Chinese

Facilitating reading and research for modern readers

🚀 From Classical(ancient) Chinese to Modern Chinese

This model translates Classical(ancient) Chinese to Modern Chinese, offering a practical solution for those interested in classical Chinese literature.

🚀 Quick Start

This model has been developed into an application. 【Sui Wuya】 is a classical Chinese reading application powered by Hugging Face Spaces and Streamlit, which contains a vast collection of books and supports translation while reading. You can input Classical Chinese, either punctuated or unpunctuated, and the model will predict the Modern Chinese expression. Other related models include:

Translation from Modern Chinese to Classical Chinese

This is a translator from Classical Chinese to Modern Chinese. Welcome to my GitHub project page on classical Chinese poetry to discuss and give a star ⭐️.

The training corpus consists of over 900,000 sentence pairs. Link to the dataset 📚. During training, for the source sequence (Classical Chinese sequence), all punctuation marks are removed from the entire sentence with a probability of 50%.

✨ Features

Recommended Inference Channel

⚠️ Important Note

You must set the eos_token_id parameter of the generate function to 102 to get a complete translated sentence. Otherwise, there may be residual sentences after translation (due to using the pad label = -100 during entropy calculation). Currently, the compute button on the Hugging Face page has this issue. It is recommended to use the following code to get the translation results.

Please set the num_beams parameter of the generate function to be greater than or equal to 3 to achieve better translation results.

Please set the max_length parameter of the generate function to 256. Otherwise, the result may cut off the sentence.

from transformers import (
  EncoderDecoderModel,
  AutoTokenizer
)
PRETRAINED = "raynardj/wenyanwen-ancient-translate-to-modern"
tokenizer = AutoTokenizer.from_pretrained(PRETRAINED)
model = EncoderDecoderModel.from_pretrained(PRETRAINED)
def inference(text):
    tk_kwargs = dict(
      truncation=True,
      max_length=128,
      padding="max_length",
      return_tensors='pt')
   
    inputs = tokenizer([text,],**tk_kwargs)
    with torch.no_grad():
        return tokenizer.batch_decode(
            model.generate(
            inputs.input_ids,
            attention_mask=inputs.attention_mask,
            num_beams=3,
            max_length=256,
            bos_token_id=101,
            eos_token_id=tokenizer.sep_token_id,
            pad_token_id=tokenizer.pad_token_id,
        ), skip_special_tokens=True)

💻 Usage Examples

Basic Usage

Of course, when using well - known sentences, there are usually some laughable mistakes. If you have any fun cases, please feel free to provide feedback.

>>> inference('非我族类其心必异')
['Not of our clan, their hearts must be different.']
>>> inference('肉食者鄙未能远谋')
['Those who eat meat are vulgar and cannot plan far - reaching.']
# Here, several versions of my model failed to translate the character "输" (one version even translated it as Emperor Qin Shi Huang and Emperor Han Wu). Maybe it's not a very archaic usage.
>>> inference('江山如此多娇引无数英雄竞折腰惜秦皇汉武略输文采唐宗宋祖稍逊风骚')
['The land is so charming that it attracts countless heroes to bow down. It\'s a pity that Emperor Qin Shi Huang and Emperor Han Wu are slightly lacking in literary grace, and Emperor Tang Zong and Emperor Song Zu are a bit less elegant.']
>>> inference("清风徐来水波不兴")
['A gentle breeze blows slowly, and the water ripples not.']
>>> inference("无他唯手熟尔")
['There is nothing else but being skillful with practice.']
>>> inference("此诚危急存亡之秋也")
['This is truly a critical moment of survival or destruction.']

📚 Documentation

Other Resources for Classical Chinese Poetry

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご