🚀 T5 English, Russian and Chinese Multilingual Machine Translation
This model is a conventional T5 transformer in multitasking mode, precisely configured for machine translation between Russian, Chinese, and English. It can perform direct translation between any language pair with a simple prefix.
🚀 Quick Start
Prerequisites
This model requires the transformers
library. You can install it using pip install transformers
.
Basic Usage
The model can perform direct translation between any pair of Russian, Chinese, or English languages. For translation into the target language, specify the target language identifier as a prefix 'translate to :'. The source language may not be specified, and the source text may be multilingual.
💻 Usage Examples
Basic Usage
from transformers import T5ForConditionalGeneration, T5Tokenizer
model_name = 'utrobinmv/t5_translate_en_ru_zh_small_1024'
model = T5ForConditionalGeneration.from_pretrained(model_name)
tokenizer = T5Tokenizer.from_pretrained(model_name)
prefix = 'translate to zh: '
src_text = prefix + "Цель разработки — предоставить пользователям личного синхронного переводчика."
input_ids = tokenizer(src_text, return_tensors="pt")
generated_tokens = model.generate(**input_ids)
result = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
print(result)
from transformers import T5ForConditionalGeneration, T5Tokenizer
model_name = 'utrobinmv/t5_translate_en_ru_zh_small_1024'
model = T5ForConditionalGeneration.from_pretrained(model_name)
tokenizer = T5Tokenizer.from_pretrained(model_name)
prefix = 'translate to ru: '
src_text = prefix + "开发的目的是为用户提供个人同步翻译。"
input_ids = tokenizer(src_text, return_tensors="pt")
generated_tokens = model.generate(**input_ids)
result = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
print(result)
✨ Features
- Supports direct translation between Russian, Chinese, and English.
- Multilingual source text is supported.
- Use a simple prefix to specify the target language.
📚 Documentation
Languages covered
Russian (ru_RU), Chinese (zh_CN), English (en_US)
Tags
- translation
- text2text-generation
- t5
License
apache-2.0
Datasets
Metrics
Model Index
Model Name |
Task |
Dataset |
Metrics |
Source |
utrobinmv/t5_translate_en_ru_zh_base_200 |
Translation en-ru |
ntrex_en-ru |
sacrebleu: 28.575940911021487 chrf: 54.27996346886896 ter: 62.494863914873584 meteor: 0.5174833677740809 rouge (ROUGE-1): 0.1908317951570274 rouge (ROUGE-2): 0.065555552204933 rouge (ROUGE-L): 0.1895542893295215 rouge (ROUGE-LSUM): 0.1893813749889601 bertscore (bertscore_f1): 0.8554933660030365 bertscore (bertscore_precision): 0.8578473615646363 bertscore (bertscore_recall): 0.8534188346862793 |
NTREX dataset Benchmark |
utrobinmv/t5_translate_en_ru_zh_base_200 |
Translation ru-en |
ntrex_ru-en |
sacrebleu: 28.575940911021487 chrf: 54.27996346886896 ter: 62.494863914873584 meteor: 0.5174833677740809 rouge (ROUGE-1): 0.1908317951570274 rouge (ROUGE-2): 0.065555552204933 rouge (ROUGE-L): 0.1895542893295215 rouge (ROUGE-LSUM): 0.1893813749889601 bertscore (bertscore_f1): 0.8554933660030365 bertscore (bertscore_precision): 0.8578473615646363 bertscore (bertscore_recall): 0.8534188346862793 |
NTREX dataset Benchmark |
📄 License
This project is licensed under the Apache-2.0 license.