🚀 FSMT
This project presents a ported version of the fairseq wmt19 transformer for German - English translation. It offers a practical solution for language translation tasks, leveraging pre - trained models from Facebook's research.
🚀 Quick Start
from transformers import FSMTForConditionalGeneration, FSMTTokenizer
mname = "facebook/wmt19-de-en"
tokenizer = FSMTTokenizer.from_pretrained(mname)
model = FSMTForConditionalGeneration.from_pretrained(mname)
input = "Maschinelles Lernen ist großartig, oder?"
input_ids = tokenizer.encode(input, return_tensors="pt")
outputs = model.generate(input_ids)
decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(decoded)
✨ Features
- Multiple Language Pairs: All four models for different language pairs are available, including
wmt19-en-ru
, wmt19-ru-en
, wmt19-en-de
, and wmt19-de-en
.
- Based on Research: It is based on Facebook FAIR's WMT19 News Translation Task Submission, ensuring high - quality translation.
📦 Installation
The installation process mainly involves using the transformers
library. You can install it via pip
:
pip install transformers
💻 Usage Examples
Basic Usage
from transformers import FSMTForConditionalGeneration, FSMTTokenizer
mname = "facebook/wmt19-de-en"
tokenizer = FSMTTokenizer.from_pretrained(mname)
model = FSMTForConditionalGeneration.from_pretrained(mname)
input = "Maschinelles Lernen ist großartig, oder?"
input_ids = tokenizer.encode(input, return_tensors="pt")
outputs = model.generate(input_ids)
decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(decoded)
📚 Documentation
Model description
This is a ported version of fairseq wmt19 transformer for de - en. For more details, please see, Facebook FAIR's WMT19 News Translation Task Submission. The abbreviation FSMT stands for FairSeqMachineTranslation.
All four models are available:
Intended uses & limitations
Limitations and bias
- The original (and this ported model) doesn't seem to handle well inputs with repeated sub - phrases, content gets truncated
Training data
Pretrained weights were left identical to the original model released by fairseq. For more details, please, see the paper.
Eval results
Property |
Details |
Model Type |
Ported version of fairseq wmt19 transformer for de - en |
Training Data |
Same as the original model released by fairseq. See paper for details |
The evaluation results are as follows:
pair |
fairseq |
transformers |
de - en |
42.3 |
41.35 |
The score is slightly below the score reported by fairseq
, since transformers
currently doesn't support:
- model ensemble, therefore the best performing checkpoint was ported (
model4.pt
).
- re - ranking
The score was calculated using this code:
git clone https://github.com/huggingface/transformers
cd transformers
export PAIR=de-en
export DATA_DIR=data/$PAIR
export SAVE_DIR=data/$PAIR
export BS=8
export NUM_BEAMS=15
mkdir -p $DATA_DIR
sacrebleu -t wmt19 -l $PAIR --echo src > $DATA_DIR/val.source
sacrebleu -t wmt19 -l $PAIR --echo ref > $DATA_DIR/val.target
echo $PAIR
PYTHONPATH="src:examples/seq2seq" python examples/seq2seq/run_eval.py facebook/wmt19-$PAIR $DATA_DIR/val.source $SAVE_DIR/test_translations.txt --reference_path $DATA_DIR/val.target --score_path $SAVE_DIR/test_bleu.json --bs $BS --task translation --num_beams $NUM_BEAMS
note: fairseq reports using a beam of 50, so you should get a slightly higher score if re - run with --num_beams 50
.
Data Sources
BibTeX entry and citation info
@inproceedings{...,
year={2020},
title={Facebook FAIR's WMT19 News Translation Task Submission},
author={Ng, Nathan and Yee, Kyra and Baevski, Alexei and Ott, Myle and Auli, Michael and Edunov, Sergey},
booktitle={Proc. of WMT},
}
🔧 Technical Details
The model is a ported version of the fairseq wmt19 transformer. The pre - trained weights are kept the same as the original fairseq model. The evaluation score calculation involves specific scripts and parameters, and the limitations are mainly due to the lack of support for model ensemble and re - ranking in the transformers
library.
📄 License
This project is licensed under the Apache - 2.0 license.
💡 Usage Tip
Fairseq reports using a beam of 50, so you should get a slightly higher score if re - run the evaluation with --num_beams 50
.
TODO
- port model ensemble (fairseq uses 4 model checkpoints)