opus-mt-tc-big-en-el開源翻譯模型 - 免費實現英語到現代希臘語精準翻譯

首頁

Opus Mt Tc Big En El

由Helsinki-NLP開發

這是一個用於英語到現代希臘語的神經機器翻譯模型，屬於OPUS-MT項目的一部分。

機器翻譯

Transformers

支持多種語言#英語-希臘語翻譯 #高精度機器翻譯 #多語言支持

下載量 111

發布時間 : 4/13/2022

模型概述

該模型專門用於從英語翻譯到現代希臘語的神經機器翻譯任務，採用transformer-big架構訓練。

模型特點

高質量翻譯

在flores101-devtest數據集上達到27.4 BLEU分數，在tatoeba-test-v2021-08-07數據集上達到55.4 BLEU分數。

多語言支持

支持英語到現代希臘語的翻譯任務。

開源許可

採用cc-by-4.0許可證，允許廣泛使用和修改。

模型能力

英語到希臘語文本翻譯

使用案例

文本翻譯

日常用語翻譯

將英語日常用語翻譯成希臘語

高質量翻譯結果，適合日常交流

文檔翻譯

將英語文檔翻譯成希臘語

保持原文語義和結構

🚀 opus-mt-tc-big-en-el

這是一個用於將英語（en）翻譯成現代希臘語（1453 年至今）（el）的神經機器翻譯模型。該模型是 OPUS - MT 項目的一部分，此項目致力於讓神經機器翻譯模型在全球多種語言中廣泛可用且易於獲取。所有模型最初都使用 Marian NMT 這一出色的框架進行訓練，它是一個用純 C++ 編寫的高效神經機器翻譯實現。這些模型已通過 huggingface 的 transformers 庫轉換為 PyTorch 格式。訓練數據來自 OPUS，訓練流程採用 OPUS - MT - train 的方法。

相關出版物：OPUS - MT – Building open translation services for the World 和 The Tatoeba Translation Challenge – Realistic Data Sets for Low Resource and Multilingual MT（如果使用此模型，請引用這些文獻。）

@inproceedings{tiedemann-thottingal-2020-opus,
    title = "{OPUS}-{MT} {--} Building open translation services for the World",
    author = {Tiedemann, J{\"o}rg  and Thottingal, Santhosh},
    booktitle = "Proceedings of the 22nd Annual Conference of the European Association for Machine Translation",
    month = nov,
    year = "2020",
    address = "Lisboa, Portugal",
    publisher = "European Association for Machine Translation",
    url = "https://aclanthology.org/2020.eamt-1.61",
    pages = "479--480",
}

@inproceedings{tiedemann-2020-tatoeba,
    title = "The Tatoeba Translation Challenge {--} Realistic Data Sets for Low Resource and Multilingual {MT}",
    author = {Tiedemann, J{\"o}rg},
    booktitle = "Proceedings of the Fifth Conference on Machine Translation",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.wmt-1.139",
    pages = "1174--1182",
}

✨ 主要特性

支持英語到現代希臘語的翻譯。
是 OPUS - MT 項目的一部分，訓練資源豐富。
基於 Marian NMT 框架訓練，後轉換為 PyTorch 格式。

📦 安裝指南

文檔未提供安裝步驟，暫不展示。

💻 使用示例

基礎用法

from transformers import MarianMTModel, MarianTokenizer

src_text = [
    "If I weren't broke, I'd buy it.",
    "I received your telegram."
]

model_name = "pytorch-models/opus-mt-tc-big-en-el"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)
translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True))

for t in translated:
    print( tokenizer.decode(t, skip_special_tokens=True) )

# expected output:
#     Αν δεν ήμουν άφραγκος, θα το αγόραζα.
#     Έλαβα το τηλεγράφημα σου.

高級用法

from transformers import pipeline
pipe = pipeline("translation", model="Helsinki-NLP/opus-mt-tc-big-en-el")
print(pipe("If I weren't broke, I'd buy it."))

# expected output: Αν δεν ήμουν άφραγκος, θα το αγόραζα.

📚 詳細文檔

模型信息

屬性	詳情
發佈時間	2022 - 03 - 13
源語言	英語（eng）
目標語言	現代希臘語（ell）
模型類型	transformer - big
訓練數據	opusTCv20210807 + bt (來源)
分詞方式	SentencePiece (spm32k,spm32k)
原始模型	opusTCv20210807 + bt_transformer - big_2022 - 03 - 13.zip
更多發佈模型信息	OPUS - MT eng - ell README

基準測試

語言對	測試集	chr - F	BLEU	句子數量	單詞數量
eng - ell	tatoeba - test - v2021 - 08 - 07	0.73660	55.4	10899	66884
eng - ell	flores101 - devtest	0.53952	27.4	1012	26615

測試集翻譯結果：opusTCv20210807 + bt_transformer - big_2022 - 03 - 13.test.txt
測試集得分：opusTCv20210807 + bt_transformer - big_2022 - 03 - 13.eval.txt
基準測試結果：benchmark_results.txt
基準測試輸出：benchmark_translations.zip

致謝

這項工作得到了以下機構的支持：

[歐洲語言網格](https://www.european - language - grid.eu/) 的 [試點項目 2866](https://live.european - language - grid.eu/catalogue/#/resource/projects/2866)。
[FoTran 項目](https://www.helsinki.fi/en/researchgroups/natural - language - understanding - with - cross - lingual - grounding)，由歐盟的“地平線 2020”研究與創新計劃下的歐洲研究理事會（ERC）資助（資助協議編號 771113）。
MeMAD 項目，由歐盟的“地平線 2020”研究與創新計劃資助（資助協議編號 780069）。

我們也感謝 CSC -- 芬蘭科學信息技術中心提供的慷慨計算資源和 IT 基礎設施。