模型概述
模型特點
模型能力
使用案例
🚀 Unbabel/wmt22-cometkiwi-da 翻譯質量評估模型
本項目是一個翻譯質量評估模型,它接收源句子和對應的翻譯內容,然後返回一個反映翻譯質量的分數,為翻譯質量評估提供了有效的解決方案。
🚀 快速開始
本模型是一個 COMET 質量評估模型,它接收源句子和對應的翻譯內容,然後返回一個反映翻譯質量的分數。
📚 詳細文檔
論文引用
CometKiwi: IST-Unbabel 2022 Submission for the Quality Estimation Shared Task (Rei et al., WMT 2022)
模型信息
屬性 | 詳情 |
---|---|
模型類型 | 翻譯質量評估模型 |
基礎模型 | microsoft/infoxlm-large |
支持語言 | Afrikaans, Albanian, Amharic, Arabic, Armenian, Assamese, Azerbaijani, Basque, Belarusian, Bengali, Bengali Romanized, Bosnian, Breton, Bulgarian, Burmese, Burmese, Catalan, Chinese (Simplified), Chinese (Traditional), Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Filipino, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Hausa, Hebrew, Hindi, Hindi Romanized, Hungarian, Icelandic, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Kurdish (Kurmanji), Kyrgyz, Lao, Latin, Latvian, Lithuanian, Macedonian, Malagasy, Malay, Malayalam, Marathi, Mongolian, Nepali, Norwegian, Oriya, Oromo, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Sanskri, Scottish, Gaelic, Serbian, Sindhi, Sinhala, Slovak, Slovenian, Somali, Spanish, Sundanese, Swahili, Swedish, Tamil, Tamil Romanized, Telugu, Telugu Romanized, Thai, Turkish, Ukrainian, Urdu, Urdu Romanized, Uyghur, Uzbek, Vietnamese, Welsh, Western, Frisian, Xhosa, Yiddish |
許可證 | cc-by-nc-sa-4.0 |
預期用途
我們的模型旨在用於無參考機器翻譯評估。給定源文本及其翻譯,輸出一個介於 0 到 1 之間的分數,其中 1 表示完美翻譯。
覆蓋語言
該模型基於 InfoXLM 構建,支持以下語言: Afrikaans, Albanian, Amharic, Arabic, Armenian, Assamese, Azerbaijani, Basque, Belarusian, Bengali, Bengali Romanized, Bosnian, Breton, Bulgarian, Burmese, Burmese, Catalan, Chinese (Simplified), Chinese (Traditional), Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Filipino, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Hausa, Hebrew, Hindi, Hindi Romanized, Hungarian, Icelandic, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Kurdish (Kurmanji), Kyrgyz, Lao, Latin, Latvian, Lithuanian, Macedonian, Malagasy, Malay, Malayalam, Marathi, Mongolian, Nepali, Norwegian, Oriya, Oromo, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Sanskri, Scottish, Gaelic, Serbian, Sindhi, Sinhala, Slovak, Slovenian, Somali, Spanish, Sundanese, Swahili, Swedish, Tamil, Tamil Romanized, Telugu, Telugu Romanized, Thai, Turkish, Ukrainian, Urdu, Urdu Romanized, Uyghur, Uzbek, Vietnamese, Welsh, Western, Frisian, Xhosa, Yiddish。
因此,包含未覆蓋語言的語言對的評估結果可能不可靠!
📦 安裝指南
使用此模型需要安裝 unbabel-comet:
pip install --upgrade pip # 確保 pip 是最新版本
pip install "unbabel-comet>=2.0.0"
在使用之前,請確保您已確認其許可證並登錄 Hugging Face Hub:
huggingface-cli login
# 或者使用環境變量
huggingface-cli login --token $HUGGINGFACE_TOKEN
💻 使用示例
基礎用法
可以通過 comet CLI 使用該模型:
comet-score -s {source-input}.txt -t {translation-output}.txt --model Unbabel/wmt22-cometkiwi-da
高級用法
也可以使用 Python 代碼調用:
from comet import download_model, load_from_checkpoint
model_path = download_model("Unbabel/wmt22-cometkiwi-da")
model = load_from_checkpoint(model_path)
data = [
{
"src": "The output signal provides constant sync so the display never glitches.",
"mt": "Das Ausgangssignal bietet eine konstante Synchronisation, so dass die Anzeige nie stört."
},
{
"src": "Kroužek ilustrace je určen všem milovníkům umění ve věku od 10 do 15 let.",
"mt": "Кільце ілюстрації призначене для всіх любителів мистецтва у віці від 10 до 15 років."
},
{
"src": "Mandela then became South Africa's first black president after his African National Congress party won the 1994 election.",
"mt": "その後、1994年の選挙でアフリカ國民會議派が勝利し、南アフリカ初の黒人大統領となった。"
}
]
model_output = model.predict(data, batch_size=8, gpus=1)
print (model_output)
📄 許可證
本項目採用 cc-by-nc-sa-4.0 許可證。









