microtransquest-en_zh-wiki開源翻譯評估模型 - 免費實現句級和詞級翻譯質量預測

首頁

Microtransquest En Zh Wiki

由TransQuest開發

基於跨語言Transformer的翻譯質量評估模型，支持句子級和詞彙級質量預測

問答系統

Transformers

開源協議:Apache-2.0 #翻譯質量評估 #跨語言Transformer #無參考評估

下載量 27

發布時間 : 3/2/2022

模型概述

TransQuest是一種無需參考譯文即可自動評估翻譯質量的系統，支持15種語言對，在WMT 2020質量評估任務中表現優異

模型特點

多層級評估

支持文檔級、句子級和詞彙級三個層級的翻譯質量評估

雙維度預測

可同時預測譯後編輯工作量（PE effort）和直接質量評分（DA score）

多語言支持

提供15種語言對的預訓練模型，包括英語-中文等常見語言對

性能領先

在WMT 2020評測中超越OpenKiwi和DeepQuest等現有方案

模型能力

機器翻譯質量評分

譯後編輯需求預測

錯誤詞彙定位

多語言質量評估

使用案例

翻譯服務

引擎優選

當多個翻譯引擎可選時自動篩選最佳譯文

提升翻譯工作流程效率

質量預警

向終端用戶提示機器翻譯內容的可靠性

降低誤用低質量翻譯的風險

本地化流程

自動化質檢

判定譯文是否可直接發佈或需要人工干預

優化譯後編輯資源分配

🚀 TransQuest：基於跨語言Transformer的翻譯質量評估

翻譯質量評估（QE）旨在在沒有參考譯文的情況下評估翻譯的質量。高精度且易於部署到多種語言對的QE技術，是許多商業翻譯工作流程中所缺失的一環，因為它有眾多潛在用途。當有多個翻譯引擎可用時，它可用於選擇最佳譯文；也能讓終端用戶瞭解自動翻譯內容的可靠性。此外，QE系統可用於判斷譯文在特定上下文中是否可直接發佈，或者在發佈前是否需要人工後編輯，又或者是否需要人工重新翻譯。質量評估可以在不同層面進行：文檔級、句子級和單詞級。

藉助TransQuest，我們將翻譯質量評估方面的研究成果開源，該成果還贏得了WMT 2020句子級直接評估質量評估共享任務。TransQuest的表現優於當前的開源質量評估框架，如OpenKiwi和DeepQuest。

✨ 主要特性

句子級翻譯質量評估，涵蓋預測後編輯工作量和直接評估兩個方面。
單詞級翻譯質量評估，能夠預測源單詞、目標單詞和目標間隙的質量。
在所有實驗語言中，表現優於當前最先進的質量評估方法，如DeepQuest和OpenKiwi。
在HuggingFace上提供了十五種語言對的預訓練質量評估模型。

📦 安裝指南

通過pip安裝

pip install transquest

從源代碼安裝

git clone https://github.com/TharinduDR/TransQuest.git
cd TransQuest
pip install -r requirements.txt

💻 使用示例

基礎用法

from transquest.algo.word_level.microtransquest.run_model import MicroTransQuestModel
import torch

model = MicroTransQuestModel("xlmroberta", "TransQuest/microtransquest-en_zh-wiki", labels=["OK", "BAD"], use_cuda=torch.cuda.is_available())
source_tags, target_tags = model.predict([["if not , you may not be protected against the diseases . ", "ja tā nav , Jūs varat nepasargāt no slimībām . "]])

📚 詳細文檔

更多詳細信息請參考以下文檔：

安裝 - 使用pip在本地安裝TransQuest。
架構 - 查看TransQuest中實現的架構
1. 句子級架構 - 我們發佈了兩種架構：MonoTransQuest和SiameseTransQuest，用於進行句子級質量評估。
2. 單詞級架構 - 我們發佈了MicroTransQuest，用於進行單詞級質量評估。
示例 - 我們提供了幾個關於如何在最近的WMT質量評估共享任務中使用TransQuest的示例。
1. 句子級示例
2. 單詞級示例
預訓練模型 - 我們提供了涵蓋句子級和單詞級的十五種語言對的預訓練質量評估模型。
1. 句子級模型
2. 單詞級模型
聯繫我們 - 如有任何關於TransQuest的問題，請聯繫我們。

📄 許可證

本項目採用Apache 2.0許可證。

📖 引用

如果您使用單詞級架構，請考慮引用這篇已被ACL 2021接受的論文：

@InProceedings{ranasinghe2021,
author = {Ranasinghe, Tharindu and Orasan, Constantin and Mitkov, Ruslan},
title = {An Exploratory Analysis of Multilingual Word Level Quality Estimation with Cross-Lingual Transformers},
booktitle = {Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics},
year = {2021}
}

如果您使用句子級架構，請考慮引用這些在COLING 2020和WMT 2020（於EMNLP 2020期間）上發表的論文：

@InProceedings{transquest:2020a,
author = {Ranasinghe, Tharindu and Orasan, Constantin and Mitkov, Ruslan},
title = {TransQuest: Translation Quality Estimation with Cross-lingual Transformers},
booktitle = {Proceedings of the 28th International Conference on Computational Linguistics},
year = {2020}
}

@InProceedings{transquest:2020b,
author = {Ranasinghe, Tharindu and Orasan, Constantin and Mitkov, Ruslan},
title = {TransQuest at WMT2020: Sentence-Level Direct Assessment},
booktitle = {Proceedings of the Fifth Conference on Machine Translation},
year = {2020}
}