M-Prometheus-7Bオープンソース評価モデル - 無料で利用可能、多言語出力評価をサポート

ホーム

M Prometheus 7B

Unbabelによって開発

M-PrometheusはオープンソースのLLM評価モデルで、多言語出力の評価をネイティブにサポートします。48万件の多言語直接評価とペア比較データに基づいてトレーニングされています。

大規模言語モデル

Transformers

オープンソースライセンス:その他 #多言語評価 #LLM品質評価 #翻訳品質スコアリング

ダウンロード数 238

リリース時間 : 4/7/2025

モデル概要

オープンソースの多言語LLM評価キットで、多言語出力の評価をサポートし、Prometheus-2と互換性があります。

モデル特徴

多言語評価

多言語出力の評価をネイティブにサポートし、48万件の多言語データでトレーニングされています

互換性

使用方法はPrometheus-2と完全に互換性があります

長文フィードバック

長文フィードバックを含む評価をサポートします

モデル能力

多言語テキスト評価

機械翻訳品質評価

詳細な評価フィードバックの生成

使用事例

機械翻訳評価

翻訳品質評価

ソース言語からターゲット言語への翻訳品質を評価します

1-5点のスコアと詳細なフィードバックを提供します

LLM出力評価

多言語生成評価

多言語LLMの生成品質を評価します

正確性、流暢さ、スタイルなどの多面的な評価を提供します

🚀 M-Prometheus

M-Prometheusは、マルチリンガルな出力をネイティブに評価できるオープンな大規模言語モデル（LLM）の評価器のセットです。これらは、48万件のマルチリンガルの直接評価とペアワイズ比較のデータと長文のフィードバックを用いて訓練されています。これらは、Prometheus-2と同じ方法でプロンプトを与えることができます。詳細については、私たちの論文をご覧ください。

🚀 クイックスタート

当社のモデルは、Prometheus-2と同じ方法でプロンプトを与えることができます。

💻 使用例

基本的な使用法

直接評価の機械翻訳評価には、以下のプロンプトを使用します。

"""###Task Description: An instruction (might include an Input inside it), a response to evaluate, a reference answer that gets a score of 5, and a score rubric representing a evaluation criteria are given. 
1. Write a detailed feedback that assess the quality of the response strictly based on the given score rubric, not evaluating in general. 
2. After writing a feedback, write a score that is an integer between 1 and 5. You should refer to the score rubric. 
3. The output format should look as follows: "Feedback: (write a feedback for criteria) [RESULT] (an integer number between 1 and 5)" 
4. Please do not generate any other opening, closing, and explanations.

###The instruction to evaluate:
Translate the following text from {source_language} to {target_language}: {source}

###Response to evaluate:
{hypothesis}

###Reference Answer (Score 5):
{reference}

###Score Rubrics: [Accuracy, Fluency, Style]
Score 1: The translation contains major errors that significantly alter the meaning of the source text. It is barely comprehensible and reads like a poor machine translation. The style is completely inconsistent with the source text.
Score 2: The translation has several inaccuracies that affect the overall meaning. It is difficult to read and understand, with frequent awkward phrasings. The style only occasionally matches the source text.
Score 3: The translation is mostly accurate but has some minor errors that don't significantly alter the meaning. It is generally understandable but lacks natural flow in some parts. The style is somewhat consistent with the source text.
Score 4: The translation is accurate with only a few negligible errors. It reads naturally for the most part, with occasional minor awkwardness. The style largely matches that of the source text.
Score 5: The translation is highly accurate, conveying the full meaning of the source text. It reads as fluently as an original text in the target language. The style perfectly captures the tone and register of the source text.

###Feedback:
"""

📄 ライセンス

ライセンスは、otherです。

📚 引用

@misc{pombal2025mprometheussuiteopenmultilingual,
      title={M-Prometheus: A Suite of Open Multilingual LLM Judges}, 
      author={José Pombal and Dongkeun Yoon and Patrick Fernandes and Ian Wu and Seungone Kim and Ricardo Rei and Graham Neubig and André F. T. Martins},
      year={2025},
      eprint={2504.04953},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2504.04953}, 
}