WMT19 En-De Open-Source English-German Translation Model - Free Deployment to Power WMT19 News Translation!

Wmt19 En De

Developed by facebook

This is an English-German translation model based on the fairseq wmt19 transformer, released by Facebook AI Research, which performed excellently in the WMT19 news translation task.

Machine Translation

Transformers

Supports Multiple LanguagesOpen Source License:Apache-2.0 #English-German translation #WMT19 benchmark #High BLEU score

Downloads 99.46k

Release Time : 3/2/2022

Model Overview

This model is specifically designed for machine translation tasks from English to German, based on the Transformer architecture and trained on the WMT19 dataset.

Model Features

High-performance translation

Achieved a BLEU score of 43.1 on the WMT19 English-German translation task

Transformer-based

Utilizes the advanced Transformer architecture to deliver high-quality translation results

Easy to use

Can be easily called via the Hugging Face Transformers library

Model Capabilities

English to German text translation

News text translation

Long text translation

Use Cases

News translation

News article translation

Translate English news articles into German

High-quality translation results suitable for news publishing

Business applications

Business document translation

Translate business contracts, reports, and other documents

Maintains accuracy of professional terminology

🚀 FSMT

FSMT is a ported version of the fairseq wmt19 transformer for English - German translation, offering high - quality translation services.

🚀 Quick Start

How to use

from transformers import FSMTForConditionalGeneration, FSMTTokenizer
mname = "facebook/wmt19-en-de"
tokenizer = FSMTTokenizer.from_pretrained(mname)
model = FSMTForConditionalGeneration.from_pretrained(mname)

input = "Machine learning is great, isn't it?"
input_ids = tokenizer.encode(input, return_tensors="pt")
outputs = model.generate(input_ids)
decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(decoded) # Maschinelles Lernen ist großartig, oder?

✨ Features

Support for English - German, English - Russian, Russian - English, and German - English translation.
Based on the fairseq wmt19 transformer architecture.

📚 Documentation

Model description

This is a ported version of fairseq wmt19 transformer for en - de.

For more details, please see, Facebook FAIR's WMT19 News Translation Task Submission.

The abbreviation FSMT stands for FairSeqMachineTranslation

All four models are available:

[wmt19 - en - ru](https://huggingface.co/facebook/wmt19 - en - ru)
[wmt19 - ru - en](https://huggingface.co/facebook/wmt19 - ru - en)
[wmt19 - en - de](https://huggingface.co/facebook/wmt19 - en - de)
[wmt19 - de - en](https://huggingface.co/facebook/wmt19 - de - en)

Intended uses & limitations

Limitations and bias

The original (and this ported model) doesn't seem to handle well inputs with repeated sub - phrases, [content gets truncated](https://discuss.huggingface.co/t/issues - with - translating - inputs - containing - repeated - phrases/981)

Training data

Pretrained weights were left identical to the original model released by fairseq. For more details, please, see the paper.

Eval results

Property	Details
Model Type	FSMT (FairSeq Machine Translation)
Training Data	See paper
Evaluation Results
	fairseq
en - de	[43.1](http://matrix.statmt.org/matrix/output/1909?run_id = 6862)

The score is slightly below the score reported by fairseq, since transformers currently doesn't support:

model ensemble, therefore the best performing checkpoint was ported (model4.pt).
re - ranking

The score was calculated using this code:

git clone https://github.com/huggingface/transformers
cd transformers
export PAIR=en - de
export DATA_DIR=data/$PAIR
export SAVE_DIR=data/$PAIR
export BS = 8
export NUM_BEAMS = 15
mkdir -p $DATA_DIR
sacrebleu -t wmt19 -l $PAIR --echo src > $DATA_DIR/val.source
sacrebleu -t wmt19 -l $PAIR --echo ref > $DATA_DIR/val.target
echo $PAIR
PYTHONPATH="src:examples/seq2seq" python examples/seq2seq/run_eval.py facebook/wmt19 - $PAIR $DATA_DIR/val.source $SAVE_DIR/test_translations.txt --reference_path $DATA_DIR/val.target --score_path $SAVE_DIR/test_bleu.json --bs $BS --task translation --num_beams $NUM_BEAMS

note: fairseq reports using a beam of 50, so you should get a slightly higher score if re - run with --num_beams 50.

Data Sources

BibTeX entry and citation info

@inproceedings{...,
  year={2020},
  title={Facebook FAIR's WMT19 News Translation Task Submission},
  author={Ng, Nathan and Yee, Kyra and Baevski, Alexei and Ott, Myle and Auli, Michael and Edunov, Sergey},
  booktitle={Proc. of WMT},
}

📄 License

This project is licensed under the apache - 2.0 license.

TODO

port model ensemble (fairseq uses 4 model checkpoints)

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご