M-Prometheus-7B開源評估模型 - 免費使用，支持多語言輸出評估

首頁

M Prometheus 7B

由Unbabel開發

M-Prometheus是一套開源的LLM評估模型，能夠原生支持多語言輸出的評估。基於48萬條多語言直接評估和成對比較數據訓練而成。

大型語言模型

Transformers

開源協議:其他 #多語言評估 #LLM質量評測 #翻譯質量評分

下載量 238

發布時間 : 4/7/2025

模型概述

開源多語言LLM評估套件，支持多語言輸出的評估，與Prometheus-2兼容。

模型特點

多語言評估

原生支持多語言輸出的評估，基於48萬條多語言數據訓練

兼容性

使用方式與Prometheus-2完全兼容

長文本反饋

支持包含長文本反饋的評估

模型能力

多語言文本評估

機器翻譯質量評估

生成詳細評估反饋

使用案例

機器翻譯評估

翻譯質量評估

評估從源語言到目標語言的翻譯質量

提供1-5分的評分及詳細反饋

LLM輸出評估

多語言生成評估

評估多語言LLM的生成質量

提供準確性、流暢度、風格等多維度評估

🚀 M-Prometheus

M-Prometheus是一套開源的大語言模型評估器，能夠原生評估多語言輸出。它們在48萬個多語言直接評估和成對比較實例數據上進行訓練，並帶有詳細反饋。可以像使用Prometheus-2一樣對其進行提示。更多詳細信息請查看我們的論文。

🚀 快速開始

M-Prometheus可用於原生評估多語言輸出，為多語言評估提供了有效的解決方案。

✨ 主要特性

能夠原生評估多語言輸出。
在480k實例的多語言直接評估和成對比較數據上進行訓練，並帶有長格式反饋。
可以像Prometheus-2一樣進行提示。

💻 使用示例

基礎用法

"""###Task Description: An instruction (might include an Input inside it), a response to evaluate, a reference answer that gets a score of 5, and a score rubric representing a evaluation criteria are given. 
1. Write a detailed feedback that assess the quality of the response strictly based on the given score rubric, not evaluating in general. 
2. After writing a feedback, write a score that is an integer between 1 and 5. You should refer to the score rubric. 
3. The output format should look as follows: "Feedback: (write a feedback for criteria) [RESULT] (an integer number between 1 and 5)" 
4. Please do not generate any other opening, closing, and explanations.

###The instruction to evaluate:
Translate the following text from {source_language} to {target_language}: {source}

###Response to evaluate:
{hypothesis}

###Reference Answer (Score 5):
{reference}

###Score Rubrics: [Accuracy, Fluency, Style]
Score 1: The translation contains major errors that significantly alter the meaning of the source text. It is barely comprehensible and reads like a poor machine translation. The style is completely inconsistent with the source text.
Score 2: The translation has several inaccuracies that affect the overall meaning. It is difficult to read and understand, with frequent awkward phrasings. The style only occasionally matches the source text.
Score 3: The translation is mostly accurate but has some minor errors that don't significantly alter the meaning. It is generally understandable but lacks natural flow in some parts. The style is somewhat consistent with the source text.
Score 4: The translation is accurate with only a few negligible errors. It reads naturally for the most part, with occasional minor awkwardness. The style largely matches that of the source text.
Score 5: The translation is highly accurate, conveying the full meaning of the source text. It reads as fluently as an original text in the target language. The style perfectly captures the tone and register of the source text.

###Feedback:
"""

📄 許可證

許可證類型：其他

📚 詳細文檔

屬性	詳情
庫名稱	transformers
基礎模型	Qwen/Qwen2.5-7B-Instruct

📚 引用

@misc{pombal2025mprometheussuiteopenmultilingual,
      title={M-Prometheus: A Suite of Open Multilingual LLM Judges}, 
      author={José Pombal and Dongkeun Yoon and Patrick Fernandes and Ian Wu and Seungone Kim and Ricardo Rei and Graham Neubig and André F. T. Martins},
      year={2025},
      eprint={2504.04953},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2504.04953}, 
}