🚀 TransQuest: Translation Quality Estimation with Cross-lingual Transformers
The goal of quality estimation (QE) is to evaluate the quality of a translation without access to a reference translation. High-accuracy QE, which can be easily deployed for multiple language pairs, is a crucial missing element in many commercial translation workflows due to its numerous potential applications. It can be used to select the best translation when multiple translation engines are available or to inform end - users about the reliability of automatically translated content. Additionally, QE systems can determine whether a translation can be directly published in a given context, requires human post - editing before publication, or needs to be translated from scratch by a human. Quality estimation can be performed at different levels: document, sentence, and word levels.
With TransQuest, we have open - sourced our research on translation quality estimation. TransQuest also won the sentence - level direct assessment quality estimation shared task in [WMT 2020](http://www.statmt.org/wmt20/quality - estimation - task.html). It outperforms current open - source quality estimation frameworks such as OpenKiwi and DeepQuest.
✨ Features
- Sentence - level translation quality estimation in two aspects: predicting post - editing efforts and direct assessment.
- Word - level translation quality estimation, capable of predicting the quality of source words, target words, and target gaps.
- Outperforms current state - of - the - art quality estimation methods like DeepQuest and OpenKiwi in all experimented languages.
- Pre - trained quality estimation models for fifteen language pairs are available on HuggingFace.
📦 Installation
From pip
pip install transquest
From Source
git clone https://github.com/TharinduDR/TransQuest.git
cd TransQuest
pip install -r requirements.txt
💻 Usage Examples
Basic Usage
import torch
from transquest.algo.sentence_level.siamesetransquest.run_model import SiameseTransQuestModel
model = SiameseTransQuestModel("TransQuest/siamesetransquest-da-multilingual")
predictions = model.predict([["Reducerea acestor conflicte este importantă pentru conservare.", "Reducing these conflicts is not important for preservation."]])
print(predictions)
📚 Documentation
For more details, follow the documentation.
- Installation - Install TransQuest locally using pip.
- Architectures - Check out the architectures implemented in TransQuest
- Sentence - level Architectures - We have released two architectures; MonoTransQuest and SiameseTransQuest to perform sentence - level quality estimation.
- Word - level Architecture - We have released MicroTransQuest to perform word - level quality estimation.
- Examples - We have provided several examples of how to use TransQuest in recent WMT quality estimation shared tasks.
- Sentence - level Examples
- Word - level Examples
- Pre - trained Models - We have provided pre - trained quality estimation models for fifteen language pairs covering both sentence - level and word - level
- Sentence - level Models
- Word - level Models
- Contact - Contact us for any issues with TransQuest
📄 License
This project is licensed under the apache - 2.0 license.
📄 Citations
If you are using the word - level architecture, please consider citing this paper which is accepted to ACL 2021.
@InProceedings{ranasinghe2021,
author = {Ranasinghe, Tharindu and Orasan, Constantin and Mitkov, Ruslan},
title = {An Exploratory Analysis of Multilingual Word Level Quality Estimation with Cross - Lingual Transformers},
booktitle = {Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics},
year = {2021}
}
If you are using the sentence - level architectures, please consider citing these papers which were presented in COLING 2020 and in WMT 2020 at EMNLP 2020.
@InProceedings{transquest:2020a,
author = {Ranasinghe, Tharindu and Orasan, Constantin and Mitkov, Ruslan},
title = {TransQuest: Translation Quality Estimation with Cross - Lingual Transformers},
booktitle = {Proceedings of the 28th International Conference on Computational Linguistics},
year = {2020}
}
@InProceedings{transquest:2020b,
author = {Ranasinghe, Tharindu and Orasan, Constantin and Mitkov, Ruslan},
title = {TransQuest at WMT2020: Sentence - Level Direct Assessment},
booktitle = {Proceedings of the Fifth Conference on Machine Translation},
year = {2020}
}