模型简介
模型特点
模型能力
使用案例
🚀 Unbabel/wmt22-cometkiwi-da 翻译质量评估模型
本项目是一个翻译质量评估模型,它接收源句子和对应的翻译内容,然后返回一个反映翻译质量的分数,为翻译质量评估提供了有效的解决方案。
🚀 快速开始
本模型是一个 COMET 质量评估模型,它接收源句子和对应的翻译内容,然后返回一个反映翻译质量的分数。
📚 详细文档
论文引用
CometKiwi: IST-Unbabel 2022 Submission for the Quality Estimation Shared Task (Rei et al., WMT 2022)
模型信息
属性 | 详情 |
---|---|
模型类型 | 翻译质量评估模型 |
基础模型 | microsoft/infoxlm-large |
支持语言 | Afrikaans, Albanian, Amharic, Arabic, Armenian, Assamese, Azerbaijani, Basque, Belarusian, Bengali, Bengali Romanized, Bosnian, Breton, Bulgarian, Burmese, Burmese, Catalan, Chinese (Simplified), Chinese (Traditional), Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Filipino, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Hausa, Hebrew, Hindi, Hindi Romanized, Hungarian, Icelandic, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Kurdish (Kurmanji), Kyrgyz, Lao, Latin, Latvian, Lithuanian, Macedonian, Malagasy, Malay, Malayalam, Marathi, Mongolian, Nepali, Norwegian, Oriya, Oromo, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Sanskri, Scottish, Gaelic, Serbian, Sindhi, Sinhala, Slovak, Slovenian, Somali, Spanish, Sundanese, Swahili, Swedish, Tamil, Tamil Romanized, Telugu, Telugu Romanized, Thai, Turkish, Ukrainian, Urdu, Urdu Romanized, Uyghur, Uzbek, Vietnamese, Welsh, Western, Frisian, Xhosa, Yiddish |
许可证 | cc-by-nc-sa-4.0 |
预期用途
我们的模型旨在用于无参考机器翻译评估。给定源文本及其翻译,输出一个介于 0 到 1 之间的分数,其中 1 表示完美翻译。
覆盖语言
该模型基于 InfoXLM 构建,支持以下语言: Afrikaans, Albanian, Amharic, Arabic, Armenian, Assamese, Azerbaijani, Basque, Belarusian, Bengali, Bengali Romanized, Bosnian, Breton, Bulgarian, Burmese, Burmese, Catalan, Chinese (Simplified), Chinese (Traditional), Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Filipino, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Hausa, Hebrew, Hindi, Hindi Romanized, Hungarian, Icelandic, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Kurdish (Kurmanji), Kyrgyz, Lao, Latin, Latvian, Lithuanian, Macedonian, Malagasy, Malay, Malayalam, Marathi, Mongolian, Nepali, Norwegian, Oriya, Oromo, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Sanskri, Scottish, Gaelic, Serbian, Sindhi, Sinhala, Slovak, Slovenian, Somali, Spanish, Sundanese, Swahili, Swedish, Tamil, Tamil Romanized, Telugu, Telugu Romanized, Thai, Turkish, Ukrainian, Urdu, Urdu Romanized, Uyghur, Uzbek, Vietnamese, Welsh, Western, Frisian, Xhosa, Yiddish。
因此,包含未覆盖语言的语言对的评估结果可能不可靠!
📦 安装指南
使用此模型需要安装 unbabel-comet:
pip install --upgrade pip # 确保 pip 是最新版本
pip install "unbabel-comet>=2.0.0"
在使用之前,请确保您已确认其许可证并登录 Hugging Face Hub:
huggingface-cli login
# 或者使用环境变量
huggingface-cli login --token $HUGGINGFACE_TOKEN
💻 使用示例
基础用法
可以通过 comet CLI 使用该模型:
comet-score -s {source-input}.txt -t {translation-output}.txt --model Unbabel/wmt22-cometkiwi-da
高级用法
也可以使用 Python 代码调用:
from comet import download_model, load_from_checkpoint
model_path = download_model("Unbabel/wmt22-cometkiwi-da")
model = load_from_checkpoint(model_path)
data = [
{
"src": "The output signal provides constant sync so the display never glitches.",
"mt": "Das Ausgangssignal bietet eine konstante Synchronisation, so dass die Anzeige nie stört."
},
{
"src": "Kroužek ilustrace je určen všem milovníkům umění ve věku od 10 do 15 let.",
"mt": "Кільце ілюстрації призначене для всіх любителів мистецтва у віці від 10 до 15 років."
},
{
"src": "Mandela then became South Africa's first black president after his African National Congress party won the 1994 election.",
"mt": "その後、1994年の選挙でアフリカ国民会議派が勝利し、南アフリカ初の黒人大統領となった。"
}
]
model_output = model.predict(data, batch_size=8, gpus=1)
print (model_output)
📄 许可证
本项目采用 cc-by-nc-sa-4.0 许可证。









