WMT19-RU-EN Open-Source Russian-English Translation Model - Free Deployment for Efficient Russian-English Translation

Wmt19 Ru En

Developed by facebook

This is Facebook's neural machine translation model for Russian-English trained on WMT19 data, using the Transformer architecture, which performs excellently on Russian-English translation tasks.

Machine Translation

Transformers

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Russian-English translation #WMT19 competition model #High-precision machine translation

Downloads 9,451

Release Time : 3/2/2022

Model Overview

This model is specifically designed for machine translation tasks from Russian to English and is a ported version of the model used by Facebook in the WMT19 news translation task.

Model Features

High-quality translation

Achieved a BLEU score of 41.3 on the WMT19 Russian-English translation task, demonstrating excellent performance.

Transformer-based

Utilizes the advanced Transformer architecture to provide accurate translation results.

Domain-specific optimization

Specifically optimized for news text translation.

Model Capabilities

Russian to English text translation

News text translation

Long text translation

Use Cases

Content translation

News translation

Translate Russian news content into English

High-quality translation preserving the original semantics and style.

Document translation

English translation of Russian technical documents or reports

Accurate conversion of professional terminology.

Language services

Cross-language communication

Provide real-time translation support for Russian-English bilingual communication

Facilitate cross-language communication.

🚀 FSMT

FSMT is a ported version of the fairseq wmt19 transformer for Russian - English translation. It offers a practical solution for language translation tasks, leveraging the power of pre - trained models.

🚀 Quick Start

from transformers import FSMTForConditionalGeneration, FSMTTokenizer
mname = "facebook/wmt19-ru-en"
tokenizer = FSMTTokenizer.from_pretrained(mname)
model = FSMTForConditionalGeneration.from_pretrained(mname)

input = "Машинное обучение - это здорово, не так ли?"
input_ids = tokenizer.encode(input, return_tensors="pt")
outputs = model.generate(input_ids)
decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(decoded) # Machine learning is great, isn't it?

✨ Features

Multi - language Support: All four models are available, covering different language pairs:
- [wmt19 - en - ru](https://huggingface.co/facebook/wmt19 - en - ru)
- [wmt19 - ru - en](https://huggingface.co/facebook/wmt19 - ru - en)
- [wmt19 - en - de](https://huggingface.co/facebook/wmt19 - en - de)
- [wmt19 - de - en](https://huggingface.co/facebook/wmt19 - de - en)
Based on Research: It is based on Facebook FAIR's WMT19 News Translation Task Submission, ensuring a solid theoretical foundation.

📚 Documentation

Model description

This is a ported version of fairseq wmt19 transformer for ru - en. The abbreviation FSMT stands for FairSeqMachineTranslation. For more details, please see Facebook FAIR's WMT19 News Translation Task Submission.

Intended uses & limitations

How to use

The provided Python code demonstrates how to use the FSMTForConditionalGeneration and FSMTTokenizer for translation.

Limitations and bias

The original (and this ported model) doesn't seem to handle well inputs with repeated sub - phrases, [content gets truncated](https://discuss.huggingface.co/t/issues - with - translating - inputs - containing - repeated - phrases/981).

Training data

Pretrained weights were left identical to the original model released by fairseq. For more details, please see the paper.

Eval results

pair	fairseq	transformers
ru - en	[41.3](http://matrix.statmt.org/matrix/output/1907?run_id = 6937)	39.20

The score is slightly below the score reported by fairseq, since transformers currently doesn't support:

model ensemble, therefore the best performing checkpoint was ported (model4.pt).
re - ranking

The score was calculated using this code:

git clone https://github.com/huggingface/transformers
cd transformers
export PAIR = ru - en
export DATA_DIR = data/$PAIR
export SAVE_DIR = data/$PAIR
export BS = 8
export NUM_BEAMS = 15
mkdir -p $DATA_DIR
sacrebleu -t wmt19 -l $PAIR --echo src > $DATA_DIR/val.source
sacrebleu -t wmt19 -l $PAIR --echo ref > $DATA_DIR/val.target
echo $PAIR
PYTHONPATH="src:examples/seq2seq" python examples/seq2seq/run_eval.py facebook/wmt19 - $PAIR $DATA_DIR/val.source $SAVE_DIR/test_translations.txt --reference_path $DATA_DIR/val.target --score_path $SAVE_DIR/test_bleu.json --bs $BS --task translation --num_beams $NUM_BEAMS

Note: fairseq reports using a beam of 50, so you should get a slightly higher score if re - run with --num_beams 50.

Data Sources

BibTeX entry and citation info

@inproceedings{...,
  year={2020},
  title={Facebook FAIR's WMT19 News Translation Task Submission},
  author={Ng, Nathan and Yee, Kyra and Baevski, Alexei and Ott, Myle and Auli, Michael and Edunov, Sergey},
  booktitle={Proc. of WMT},
}

📄 License

The model is licensed under the apache - 2.0 license.

TODO

port model ensemble (fairseq uses 4 model checkpoints)

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご