opus-tatoeba-es-zh Open Source Machine Translation Model - Freely Translate Spanish to Various Chinese Forms

Opus Tatoeba Es Zh

Developed by Helsinki-NLP

This is a Transformer-based machine translation model from Spanish to Chinese, supporting multiple Chinese dialects and writing forms.

Machine Translation

Transformers

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Multi-dialect support #Standardized preprocessing #SentencePiece tokenization

Downloads 399

Release Time : 3/2/2022

Model Overview

This model is specifically designed for Spanish-to-Chinese translation tasks, supporting various Chinese dialects including Mandarin, Cantonese, Wu, and their different writing forms (simplified, traditional, etc.).

Model Features

Multi-dialect support

Supports translation for multiple Chinese dialects including Mandarin, Cantonese, Wu, and their different writing forms.

Standardized preprocessing

Uses standardization and SentencePiece (spm32k) for text preprocessing.

Language identifier

Requires adding target language identifiers (e.g., >>cmn_Hans<<) at the beginning of sentences to specify translation targets.

Model Capabilities

Spanish-to-Chinese text translation

Supports multiple Chinese dialect translations

Supports simplified/traditional conversion

Use Cases

Language learning

Spanish learning aid

Helps Chinese users understand Spanish content

Cross-language communication

Business document translation

Translates Spanish business documents into Chinese

🚀 ES - ZH Translation Model

This project focuses on Spanish - Chinese translation, providing a reliable translation solution with specific model details and performance benchmarks.

🚀 Quick Start

Download the Model

You can download the original weights of the model from opus-2021-01-04.zip.

Test the Model

The test set translations can be found at opus-2021-01-04.test.txt, and the test set scores are available at opus-2021-01-04.eval.txt.

✨ Features

Translation Direction: Translates from Spanish to multiple Chinese variants.
Model Type: Utilizes a Transformer model.
Pre - processing: Applies normalization and SentencePiece (spm32k, spm32k) pre - processing.
Language Token Requirement: A sentence initial language token in the form of >>id<< (id = valid target language ID) is required.

📦 Installation

No specific installation steps are provided in the original document.

📚 Documentation

Model Details

Source Group: Spanish
Target Group: Chinese
OPUS Readme: spa - zho
Model: Transformer
Source Language(s): spa
Target Language(s): cjy_Hans cjy_Hant cmn cmn_Hans cmn_Hant hsn hsn_Hani lzh nan wuu yue_Hans yue_Hant

System Info

Property	Details
hf_name	es - zh
source_languages	spa
target_languages	zho
opus_readme_url	https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/models/spa-zho/README.md
original_repo	Tatoeba - Challenge
tags	['translation']
languages	['es', 'zh']
src_constituents	('Spanish', {'spa'})
tgt_constituents	('Chinese', {'wuu_Bopo', 'wuu', 'cmn_Hang', 'lzh_Kana', 'lzh', 'wuu_Hani', 'lzh_Yiii', 'yue_Hans', 'cmn_Hani', 'cjy_Hans', 'cmn_Hans', 'cmn_Kana', 'zho_Hans', 'zho_Hant', 'yue', 'cmn_Bopo', 'yue_Hang', 'lzh_Hans', 'wuu_Latn', 'yue_Hant', 'hak_Hani', 'lzh_Bopo', 'cmn_Hant', 'lzh_Hani', 'lzh_Hang', 'cmn', 'lzh_Hira', 'yue_Bopo', 'yue_Hani', 'gan', 'zho', 'cmn_Yiii', 'yue_Hira', 'cmn_Latn', 'yue_Kana', 'cjy_Hant', 'cmn_Hira', 'nan_Hani', 'nan'})
src_multilingual	False
tgt_multilingual	False
long_pair	spa - zho
prepro	normalization + SentencePiece (spm32k, spm32k)
url_model	https://object.pouta.csc.fi/Tatoeba-MT-models/spa-zho/opus-2021-01-04.zip
url_test_set	https://object.pouta.csc.fi/Tatoeba-MT-models/spa-zho/opus-2021-01-04.test.txt
src_alpha3	spa
tgt_alpha3	zho
chrF2_score	0.324
bleu	38.8
brevity_penalty	0.878
ref_len	22762.0
src_name	Spanish
tgt_name	Chinese
train_date	2021 - 01 - 04 00:00:00
src_alpha2	es
tgt_alpha2	zh
prefer_old	False
short_pair	es - zh
helsinki_git_sha	dfdcef114ffb8a8dbb7a3fcf84bde5af50309500
transformers_git_sha	1310e1a758edc8e89ec363db76863c771fbeb1de
port_machine	LM0 - 400 - 22516.local
port_time	2021 - 01 - 04 - 18:53

🔧 Technical Details

The model uses a Transformer architecture and applies normalization and SentencePiece pre - processing. It requires a specific language token at the start of each sentence for translation.

📄 License

This project is licensed under the Apache - 2.0 license.

📊 Benchmarks

testset	BLEU	chr - F
Tatoeba - test.spa.zho	38.8	0.324

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご