Opus-mt-en-sit Open-source Translation Model - Free English to Multiple Sino-Tibetan Languages Translation

Opus Mt En Sit

Developed by Helsinki-NLP

This is a multilingual translation model based on the Transformer architecture, supporting translation tasks from English to various Sino-Tibetan languages.

Machine Translation

Transformers

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Multi-dialect Sino-Tibetan translation #SentencePiece tokenization #News domain optimization

Downloads 38

Release Time : 3/2/2022

Model Overview

This model focuses on translation tasks from English to Sino-Tibetan languages, supporting multiple languages including Tibetan, Chinese (Simplified and Traditional), Burmese, and more.

Model Features

Multilingual Support

Supports translation from English to various Sino-Tibetan languages, including Tibetan, Chinese (Simplified and Traditional), Burmese, and more.

Preprocessing Optimization

Uses normalization and SentencePiece tokenization techniques for preprocessing to improve translation quality.

Target Language Identifier

Requires adding a target language identifier (e.g., >>zho_Hans<<) at the beginning of the sentence to specify the target language for translation.

Model Capabilities

Text translation from English to Sino-Tibetan languages

Multilingual translation support

Standardized text processing

Use Cases

Language Learning

English Learning Assistance

Helps Sino-Tibetan language speakers learn English or English speakers learn Sino-Tibetan languages.

Cross-Language Communication

Real-Time Translation

Used for real-time text translation between English and Sino-Tibetan languages.

🚀 eng-sit

This project focuses on the translation from English to Sino-Tibetan languages, providing a Transformer-based model with specific pre - processing and evaluation results.

🚀 Quick Start

This is a translation model from English to Sino - Tibetan languages. You can download the original weights and test set translations from the provided links.

Model Information

Source Group: English
Target Group: Sino - Tibetan languages
OPUS Readme: [eng - sit](https://github.com/Helsinki - NLP/Tatoeba - Challenge/tree/master/models/eng - sit/README.md)
Model: Transformer
Source Language(s): eng
Target Language(s): bod, brx, brx_Latn, cjy_Hans, cjy_Hant, cmn, cmn_Hans, cmn_Hant, gan, lzh, lzh_Hans, mya, nan, wuu, yue, yue_Hans, yue_Hant, zho, zho_Hans, zho_Hant
Pre - processing: normalization + SentencePiece (spm32k, spm32k)
Requirement: A sentence initial language token is required in the form of >>id<< (id = valid target language ID)
Download Original Weights: [opus2m - 2020 - 08 - 01.zip](https://object.pouta.csc.fi/Tatoeba - MT - models/eng - sit/opus2m - 2020 - 08 - 01.zip)
Test Set Translations: [opus2m - 2020 - 08 - 01.test.txt](https://object.pouta.csc.fi/Tatoeba - MT - models/eng - sit/opus2m - 2020 - 08 - 01.test.txt)
Test Set Scores: [opus2m - 2020 - 08 - 01.eval.txt](https://object.pouta.csc.fi/Tatoeba - MT - models/eng - sit/opus2m - 2020 - 08 - 01.eval.txt)

📚 Documentation

Benchmarks

Testset	BLEU	chr - F
newsdev2017 - enzh - engzho.eng.zho	23.5	0.217
newstest2017 - enzh - engzho.eng.zho	23.2	0.223
newstest2018 - enzh - engzho.eng.zho	25.0	0.230
newstest2019 - enzh - engzho.eng.zho	20.2	0.225
Tatoeba - test.eng - bod.eng.bod	0.4	0.147
Tatoeba - test.eng - brx.eng.brx	0.5	0.012
Tatoeba - test.eng.multi	25.7	0.223
Tatoeba - test.eng - mya.eng.mya	0.2	0.222
Tatoeba - test.eng - zho.eng.zho	29.2	0.249

System Info

Property	Details
hf_name	eng - sit
source_languages	eng
target_languages	sit
opus_readme_url	https://github.com/Helsinki - NLP/Tatoeba - Challenge/tree/master/models/eng - sit/README.md
original_repo	Tatoeba - Challenge
tags	['translation']
languages	['en', 'sit']
src_constituents	{'eng'}
tgt_constituents	set()
src_multilingual	False
tgt_multilingual	True
prepro	normalization + SentencePiece (spm32k, spm32k)
url_model	https://object.pouta.csc.fi/Tatoeba - MT - models/eng - sit/opus2m - 2020 - 08 - 01.zip
url_test_set	https://object.pouta.csc.fi/Tatoeba - MT - models/eng - sit/opus2m - 2020 - 08 - 01.test.txt
src_alpha3	eng
tgt_alpha3	sit
short_pair	en - sit
chrF2_score	0.223
bleu	25.7
brevity_penalty	0.907
ref_len	109538.0
src_name	English
tgt_name	Sino - Tibetan languages
train_date	2020 - 08 - 01
src_alpha2	en
tgt_alpha2	sit
prefer_old	False
long_pair	eng - sit
helsinki_git_sha	480fcbe0ee1bf4774bcbe6226ad9f58e63f6c535
transformers_git_sha	2207e5d8cb224e954a7cba69fa4ac2309e9ff30b
port_machine	brutasse
port_time	2020 - 08 - 21 - 14:41

📄 License

This project is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご