WMT22-COMET-DA Open-Source Machine Translation Evaluation Model - Free Support for Translation Quality Assessment of Multiple Language Pairs

Wmt22 Comet Da

Developed by Unbabel

COMET-22 is a machine translation evaluation model developed by Unbabel, based on the XLM-R architecture, supporting quality assessment for multiple language pairs.

Machine Translation Supports Multiple LanguagesOpen Source License:Apache-2.0 #Multilingual Translation Evaluation #Reference-dependent Scoring #Machine Translation Quality Assessment

Downloads 6,939

Release Time : 2/10/2023

Model Overview

The model takes a triplet containing source sentences, translated text, and reference translations as input, and outputs a score between 0-1 to evaluate translation quality. Primarily used for performance evaluation and quality control of machine translation systems.

Model Features

Multilingual Support

Supports translation quality assessment for over 100 languages

Direct Assessment

Evaluates translation quality without requiring human annotations

High Correlation

Evaluation results are highly correlated with human judgments

Model Capabilities

Machine Translation Quality Scoring

Multilingual Translation Evaluation

Translation System Performance Comparison

Use Cases

Machine Translation Development

Translation System Optimization

Used to evaluate the output quality of different machine translation systems

Helps developers select the best translation model

Translation Quality Control

Translation Service Monitoring

Continuously monitors the output quality of translation services

Ensures translation services maintain high-quality standards

🚀 COMET Evaluation Model

This is a COMET evaluation model. It takes a triplet of (source sentence, translation, reference translation) and returns a score that reflects the quality of the translation compared to both the source and the reference.

📚 Documentation

Paper

COMET-22: Unbabel-IST 2022 Submission for the Metrics Shared Task (Rei et al., WMT 2022)

License

Apache-2.0

Model Information

Property	Details
Pipeline Tag	translation
Library Name	comet
Language	multilingual, af, am, ar, as, az, be, bg, bn, br, bs, ca, cs, cy, da, de, el, en, eo, es, et, eu, fa, fi, fr, fy, ga, gd, gl, gu, ha, he, hi, hr, hu, hy, id, is, it, ja, jv, ka, kk, km, kn, ko, ku, ky, la, lo, lt, lv, mg, mk, ml, mn, mr, ms, my, ne, nl, 'no', om, or, pa, pl, ps, pt, ro, ru, sa, sd, si, sk, sl, so, sq, sr, su, sv, sw, ta, te, th, tl, tr, ug, uk, ur, uz, vi, xh, yi, zh
License	apache-2.0
Base Model	FacebookAI/xlm-roberta-large

📦 Installation

Using this model requires unbabel-comet to be installed:

pip install --upgrade pip  # ensures that pip is current 
pip install unbabel-comet

💻 Usage Examples

Basic Usage

You can use it through the comet CLI:

comet-score -s {source-inputs}.txt -t {translation-outputs}.txt -r {references}.txt --model Unbabel/wmt22-comet-da

Advanced Usage

Using Python:

from comet import download_model, load_from_checkpoint

model_path = download_model("Unbabel/wmt22-comet-da")
model = load_from_checkpoint(model_path)
data = [
    {
        "src": "Dem Feuer konnte Einhalt geboten werden",
        "mt": "The fire could be stopped",
        "ref": "They were able to control the fire."
    },
    {
        "src": "Schulen und Kindergärten wurden eröffnet.",
        "mt": "Schools and kindergartens were open",
        "ref": "Schools and kindergartens opened"
    }
]
model_output = model.predict(data, batch_size=8, gpus=1)
print (model_output)

🔍 Intended Uses

Our model is intended to be used for MT evaluation. Given a triplet of (source sentence, translation, reference translation), it outputs a single score between 0 and 1, where 1 represents a perfect translation.

🌐 Languages Covered

This model builds on top of XLM-R which covers the following languages:

Afrikaans, Albanian, Amharic, Arabic, Armenian, Assamese, Azerbaijani, Basque, Belarusian, Bengali, Bengali Romanized, Bosnian, Breton, Bulgarian, Burmese, Burmese, Catalan, Chinese (Simplified), Chinese (Traditional), Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Filipino, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Hausa, Hebrew, Hindi, Hindi Romanized, Hungarian, Icelandic, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Kurdish (Kurmanji), Kyrgyz, Lao, Latin, Latvian, Lithuanian, Macedonian, Malagasy, Malay, Malayalam, Marathi, Mongolian, Nepali, Norwegian, Oriya, Oromo, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Sanskri, Scottish, Gaelic, Serbian, Sindhi, Sinhala, Slovak, Slovenian, Somali, Spanish, Sundanese, Swahili, Swedish, Tamil, Tamil Romanized, Telugu, Telugu Romanized, Thai, Turkish, Ukrainian, Urdu, Urdu Romanized, Uyghur, Uzbek, Vietnamese, Welsh, Western, Frisian, Xhosa, Yiddish.

⚠️ Important Note

Results for language pairs containing uncovered languages are unreliable!

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご