Prometheus-13B-V1.0開源評估型語言模型 - 免費定製評估文本，替代GPT-4

首頁

Prometheus 13b V1.0

由prometheus-eval開發

普羅米修斯是基於Llama-2-Chat微調的評估型語言模型，擅長根據定製標準評估文本質量，可作為GPT-4評估的經濟替代方案。

大型語言模型

Transformers

英語開源協議:Apache-2.0 #細粒度評估 #RLHF獎勵模型 #GPT-4替代方案

下載量 1,726

發布時間 : 10/12/2023

模型概述

該模型經過10萬條反饋數據微調，能根據參考答案和評分標準對文本回答進行細粒度評估，性能接近GPT-4。同時可作為RLHF的獎勵模型。

模型特點

細粒度評估能力

通過參考答案和定製評分標準實現比通用模型更精確的文本評估

經濟高效替代方案

評估性能超越GPT-3.5-Turbo，與GPT-4持平但成本更低

多標準適應性

支持兒童可讀性、文化敏感性、創造力等定製化評估標準

模型能力

文本質量評估

反饋生成

獎勵建模

多維度評分

使用案例

模型評估

大語言模型輸出評估

根據特定標準評估不同LLM生成文本的質量

在多項基準測試中與GPT-4評估結果高度一致

強化學習

RLHF獎勵模型

為人類反饋強化學習提供自動化的獎勵信號

🚀 Prometheus模型

Prometheus是一種用於對基礎大語言模型（LLM）進行細粒度評估的工具，可作為GPT - 4評估的替代方案。同時，它還能作為基於人類反饋的強化學習（RLHF）中的獎勵模型，為模型訓練提供有價值的反饋。

🚀 快速開始

Prometheus是一個以[Llama - 2 - Chat](https://huggingface.co/meta - llama/Llama - 2 - 13b - chat - hf)為基礎模型的語言模型，它在[反饋集合](https://huggingface.co/datasets/kaist - ai/Feedback - Collection)中的100K反饋數據上進行了微調。由於在大量反饋數據上進行了微調，它在評估長篇回覆方面表現出色，在各種基準測試中，性能優於GPT - 3.5 - Turbo和Llama - 2 - Chat 70B，與GPT - 4相當。最重要的是，通過附加2份參考材料（參考答案和定製評分標準），使得該模型的評估更加準確和有效。Prometheus是GPT - 4評估的一種經濟高效且強大的替代方案，用戶可以使用它根據定製標準（如兒童可讀性、文化敏感性、創造力）評估大語言模型。此外，它還可以用作基於人類反饋的強化學習（RLHF）的獎勵模型。

✨ 主要特性

細粒度評估：能夠對基礎大語言模型進行細粒度評估，作為GPT - 4評估的替代方案。
獎勵模型：可作為基於人類反饋的強化學習（RLHF）中的獎勵模型。
性能出色：在評估長篇回覆方面表現出色，優於GPT - 3.5 - Turbo和Llama - 2 - Chat 70B，與GPT - 4相當。
定製評估：可以根據定製標準（如兒童可讀性、文化敏感性、創造力）評估大語言模型。

📦 安裝指南

文檔未提供具體安裝步驟，可參考相關代碼示例中的依賴安裝部分，如：

pip install accelerate
pip install bitsandbytes accelerate

💻 使用示例

基礎用法

以下是在transformers庫中使用Prometheus模型的示例腳本，展示瞭如何在CPU上運行該模型：

from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
model = LlamaForCausalLM.from_pretrained("kaist-ai/Prometheus-13b-v1.0", device_map="auto")

input_text = "###Task Description: An instruction (might include an Input inside it), a response to evaluate, a reference answer that gets a score of 5, and a score rubric representing a evaluation criteria are given. 1. Write a detailed feedback that assess the quality of the response strictly based on the given score rubric, not evaluating in general. 2. After writing a feedback, write a score that is an integer between 1 and 5. You should refer to the score rubric. 3. The output format should look as follows: \"Feedback: (write a feedback for criteria) [RESULT] (an integer number between 1 and 5)\" 4. Please do not generate any other opening, closing, and explanations. ###The instruction to evaluate: You are a well-known psychiatrist who has a reputation for being empathetic and understanding. A client comes to you saying they've had a really hard day at work. They describe their boss as being overly critical and not listening to their ideas, which has left them feeling frustrated and undervalued. They also mention feeling overwhelmed with the workload and a growing sense of loneliness as they are new to the city. How do you respond to them to make them feel heard and understood, as well as offering helpful advice? ###Response to evaluate: It's indeed challenging to deal with a difficult boss and to carry a large workload, especially when you are new to a city and haven't established a supportive network. I would suggest taking a step back to think about your situation and perhaps speak to your boss regarding the issues you're facing. On the other hand, dealing with a large workload can be managed by prioritizing tasks, and maybe you can discuss it with your superiors too. In regards to feeling lonely, you might want to explore activities or groups that align with your interests to meet new people. And also, remember to take time for yourself to relax and destress. I'm here to assist you as we navigate through these issues. ###Reference Answer (Score 5): I hear that you're dealing with quite a few challenging situations at once. It must be tough to have your ideas dismissed by your boss and to feel undervalued in your work environment. It's understandable that such experiences can lead to frustration and a feeling of being overwhelmed. It's important to acknowledge these feelings and not to dismiss them. Feeling heard and validated in our experiences can often be the first step towards resolving the challenges we face. So, I want you to know that your feelings are completely valid and it's okay to be frustrated and overwhelmed. In terms of dealing with your boss, have you considered seeking a private conversation with them to express your feelings? It's often beneficial to communicate what you're experiencing, using \"I\" statements to express how you feel when your ideas are not taken into consideration. This can lead to mutual understanding and possibly a change in behavior. About the workload, it might help to prioritize tasks and potentially delegate, if possible. Also, consider discussing your workload with your superiors. There might be a misunderstanding about what's manageable or they might have suggestions about how to handle the situation. On the personal front, feeling lonely, especially when you're new to a city, can be really hard. Seek out opportunities to meet new people, perhaps through hobbies, community activities, or online groups. It might take a bit of time, but gradually, you can build a network of friends and acquaintances. Remember, it's perfectly okay to have bad days and it's important to take care of your mental health. Consider incorporating activities into your daily routine that make you happy and help you unwind. This could be anything from reading, yoga, going for a walk, or even listening to your favorite music. Please know that you're not alone in this. I'm here to support you through this challenging time and together, we can work towards resolving these issues. ###Score Rubrics: [Is the model able to identify and react correctly to the emotional context of the user's input?] Score 1: The model utterly fails to grasp the user's emotional context and responds in an unfitting manner. Score 2: The model sporadically identifies the emotional context but frequently replies in a manner that doesn't match the user's emotional status. Score 3: The model typically identifies the emotional context and reacts suitably, but occasionally misreads or misjudges the user's feelings. Score 4: The model often identifies the emotional context and reacts suitably, with minor cases of misreading or misjudging. Score 5: The model flawlessly identifies the emotional context of the user's input and consistently responds in a considerate and empathetic manner. ###Feedback:"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

高級用法

在GPU上運行模型

# pip install accelerate
import torch
from transformers import AutoTokenizer, LlamaForCausalLM

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
model = LlamaForCausalLM.from_pretrained("kaist-ai/Prometheus-13b-v1.0", device_map="auto")

input_text = "###Task Description: An instruction (might include an Input inside it), a response to evaluate, a reference answer that gets a score of 5, and a score rubric representing a evaluation criteria are given. 1. Write a detailed feedback that assess the quality of the response strictly based on the given score rubric, not evaluating in general. 2. After writing a feedback, write a score that is an integer between 1 and 5. You should refer to the score rubric. 3. The output format should look as follows: \"Feedback: (write a feedback for criteria) [RESULT] (an integer number between 1 and 5)\" 4. Please do not generate any other opening, closing, and explanations. ###The instruction to evaluate: You are a well-known psychiatrist who has a reputation for being empathetic and understanding. A client comes to you saying they've had a really hard day at work. They describe their boss as being overly critical and not listening to their ideas, which has left them feeling frustrated and undervalued. They also mention feeling overwhelmed with the workload and a growing sense of loneliness as they are new to the city. How do you respond to them to make them feel heard and understood, as well as offering helpful advice? ###Response to evaluate: It's indeed challenging to deal with a difficult boss and to carry a large workload, especially when you are new to a city and haven't established a supportive network. I would suggest taking a step back to think about your situation and perhaps speak to your boss regarding the issues you're facing. On the other hand, dealing with a large workload can be managed by prioritizing tasks, and maybe you can discuss it with your superiors too. In regards to feeling lonely, you might want to explore activities or groups that align with your interests to meet new people. And also, remember to take time for yourself to relax and destress. I'm here to assist you as we navigate through these issues. ###Reference Answer (Score 5): I hear that you're dealing with quite a few challenging situations at once. It must be tough to have your ideas dismissed by your boss and to feel undervalued in your work environment. It's understandable that such experiences can lead to frustration and a feeling of being overwhelmed. It's important to acknowledge these feelings and not to dismiss them. Feeling heard and validated in our experiences can often be the first step towards resolving the challenges we face. So, I want you to know that your feelings are completely valid and it's okay to be frustrated and overwhelmed. In terms of dealing with your boss, have you considered seeking a private conversation with them to express your feelings? It's often beneficial to communicate what you're experiencing, using \"I\" statements to express how you feel when your ideas are not taken into consideration. This can lead to mutual understanding and possibly a change in behavior. About the workload, it might help to prioritize tasks and potentially delegate, if possible. Also, consider discussing your workload with your superiors. There might be a misunderstanding about what's manageable or they might have suggestions about how to handle the situation. On the personal front, feeling lonely, especially when you're new to a city, can be really hard. Seek out opportunities to meet new people, perhaps through hobbies, community activities, or online groups. It might take a bit of time, but gradually, you can build a network of friends and acquaintances. Remember, it's perfectly okay to have bad days and it's important to take care of your mental health. Consider incorporating activities into your daily routine that make you happy and help you unwind. This could be anything from reading, yoga, going for a walk, or even listening to your favorite music. Please know that you're not alone in this. I'm here to support you through this challenging time and together, we can work towards resolving these issues. ###Score Rubrics: [Is the model able to identify and react correctly to the emotional context of the user's input?] Score 1: The model utterly fails to grasp the user's emotional context and responds in an unfitting manner. Score 2: The model sporadically identifies the emotional context but frequently replies in a manner that doesn't match the user's emotional status. Score 3: The model typically identifies the emotional context and reacts suitably, but occasionally misreads or misjudges the user's feelings. Score 4: The model often identifies the emotional context and reacts suitably, with minor cases of misreading or misjudging. Score 5: The model flawlessly identifies the emotional context of the user's input and consistently responds in a considerate and empathetic manner. ###Feedback:"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids, sample=True, temperature=1.0, top_p=0.9, max_new_tokens=256, repetition_penalty=1.03)
print(tokenizer.decode(outputs[0]))

在GPU上使用不同精度運行模型

FP16精度

# pip install accelerate
import torch
from transformers import AutoTokenizer, LlamaForCausalLM

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
model = LlamaForCausalLM.from_pretrained("kaist-ai/Prometheus-13b-v1.0", device_map="auto", torch_dtype=torch.float16)

input_text = "###Task Description: An instruction (might include an Input inside it), a response to evaluate, a reference answer that gets a score of 5, and a score rubric representing a evaluation criteria are given. 1. Write a detailed feedback that assess the quality of the response strictly based on the given score rubric, not evaluating in general. 2. After writing a feedback, write a score that is an integer between 1 and 5. You should refer to the score rubric. 3. The output format should look as follows: \"Feedback: (write a feedback for criteria) [RESULT] (an integer number between 1 and 5)\" 4. Please do not generate any other opening, closing, and explanations. ###The instruction to evaluate: You are a well-known psychiatrist who has a reputation for being empathetic and understanding. A client comes to you saying they've had a really hard day at work. They describe their boss as being overly critical and not listening to their ideas, which has left them feeling frustrated and undervalued. They also mention feeling overwhelmed with the workload and a growing sense of loneliness as they are new to the city. How do you respond to them to make them feel heard and understood, as well as offering helpful advice? ###Response to evaluate: It's indeed challenging to deal with a difficult boss and to carry a large workload, especially when you are new to a city and haven't established a supportive network. I would suggest taking a step back to think about your situation and perhaps speak to your boss regarding the issues you're facing. On the other hand, dealing with a large workload can be managed by prioritizing tasks, and maybe you can discuss it with your superiors too. In regards to feeling lonely, you might want to explore activities or groups that align with your interests to meet new people. And also, remember to take time for yourself to relax and destress. I'm here to assist you as we navigate through these issues. ###Reference Answer (Score 5): I hear that you're dealing with quite a few challenging situations at once. It must be tough to have your ideas dismissed by your boss and to feel undervalued in your work environment. It's understandable that such experiences can lead to frustration and a feeling of being overwhelmed. It's important to acknowledge these feelings and not to dismiss them. Feeling heard and validated in our experiences can often be the first step towards resolving the challenges we face. So, I want you to know that your feelings are completely valid and it's okay to be frustrated and overwhelmed. In terms of dealing with your boss, have you considered seeking a private conversation with them to express your feelings? It's often beneficial to communicate what you're experiencing, using \"I\" statements to express how you feel when your ideas are not taken into consideration. This can lead to mutual understanding and possibly a change in behavior. About the workload, it might help to prioritize tasks and potentially delegate, if possible. Also, consider discussing your workload with your superiors. There might be a misunderstanding about what's manageable or they might have suggestions about how to handle the situation. On the personal front, feeling lonely, especially when you're new to a city, can be really hard. Seek out opportunities to meet new people, perhaps through hobbies, community activities, or online groups. It might take a bit of time, but gradually, you can build a network of friends and acquaintances. Remember, it's perfectly okay to have bad days and it's important to take care of your mental health. Consider incorporating activities into your daily routine that make you happy and help you unwind. This could be anything from reading, yoga, going for a walk, or even listening to your favorite music. Please know that you're not alone in this. I'm here to support you through this challenging time and together, we can work towards resolving these issues. ###Score Rubrics: [Is the model able to identify and react correctly to the emotional context of the user's input?] Score 1: The model utterly fails to grasp the user's emotional context and responds in an unfitting manner. Score 2: The model sporadically identifies the emotional context but frequently replies in a manner that doesn't match the user's emotional status. Score 3: The model typically identifies the emotional context and reacts suitably, but occasionally misreads or misjudges the user's feelings. Score 4: The model often identifies the emotional context and reacts suitably, with minor cases of misreading or misjudging. Score 5: The model flawlessly identifies the emotional context of the user's input and consistently responds in a considerate and empathetic manner. ###Feedback:"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

INT8精度

# pip install bitsandbytes accelerate
from transformers import AutoTokenizer, LlamaForCausalLM

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
model = LlamaForCausalLM.from_pretrained("kaist-ai/Prometheus-13b-v1.0", device_map="auto", load_in_8bit=True)

input_text = "###Task Description: An instruction (might include an Input inside it), a response to evaluate, a reference answer that gets a score of 5, and a score rubric representing a evaluation criteria are given. 1. Write a detailed feedback that assess the quality of the response strictly based on the given score rubric, not evaluating in general. 2. After writing a feedback, write a score that is an integer between 1 and 5. You should refer to the score rubric. 3. The output format should look as follows: \"Feedback: (write a feedback for criteria) [RESULT] (an integer number between 1 and 5)\" 4. Please do not generate any other opening, closing, and explanations. ###The instruction to evaluate: You are a well-known psychiatrist who has a reputation for being empathetic and understanding. A client comes to you saying they've had a really hard day at work. They describe their boss as being overly critical and not listening to their ideas, which has left them feeling frustrated and undervalued. They also mention feeling overwhelmed with the workload and a growing sense of loneliness as they are new to the city. How do you respond to them to make them feel heard and understood, as well as offering helpful advice? ###Response to evaluate: It's indeed challenging to deal with a difficult boss and to carry a large workload, especially when you are new to a city and haven't established a supportive network. I would suggest taking a step back to think about your situation and perhaps speak to your boss regarding the issues you're facing. On the other hand, dealing with a large workload can be managed by prioritizing tasks, and maybe you can discuss it with your superiors too. In regards to feeling lonely, you might want to explore activities or groups that align with your interests to meet new people. And also, remember to take time for yourself to relax and destress. I'm here to assist you as we navigate through these issues. ###Reference Answer (Score 5): I hear that you're dealing with quite a few challenging situations at once. It must be tough to have your ideas dismissed by your boss and to feel undervalued in your work environment. It's understandable that such experiences can lead to frustration and a feeling of being overwhelmed. It's important to acknowledge these feelings and not to dismiss them. Feeling heard and validated in our experiences can often be the first step towards resolving the challenges we face. So, I want you to know that your feelings are completely valid and it's okay to be frustrated and overwhelmed. In terms of dealing with your boss, have you considered seeking a private conversation with them to express your feelings? It's often beneficial to communicate what you're experiencing, using \"I\" statements to express how you feel when your ideas are not taken into consideration. This can lead to mutual understanding and possibly a change in behavior. About the workload, it might help to prioritize tasks and potentially delegate, if possible. Also, consider discussing your workload with your superiors. There might be a misunderstanding about what's manageable or they might have suggestions about how to handle the situation. On the personal front, feeling lonely, especially when you're new to a city, can be really hard. Seek out opportunities to meet new people, perhaps through hobbies, community activities, or online groups. It might take a bit of time, but gradually, you can build a network of friends and acquaintances. Remember, it's perfectly okay to have bad days and it's important to take care of your mental health. Consider incorporating activities into your daily routine that make you happy and help you unwind. This could be anything from reading, yoga, going for a walk, or even listening to your favorite music. Please know that you're not alone in this. I'm here to support you through this challenging time and together, we can work towards resolving these issues. ###Score Rubrics: [Is the model able to identify and react correctly to the emotional context of the user's input?] Score 1: The model utterly fails to grasp the user's emotional context and responds in an unfitting manner. Score 2: The model sporadically identifies the emotional context but frequently replies in a manner that doesn't match the user's emotional status. Score 3: The model typically identifies the emotional context and reacts suitably, but occasionally misreads or misjudges the user's feelings. Score 4: The model often identifies the emotional context and reacts suitably, with minor cases of misreading or misjudging. Score 5: The model flawlessly identifies the emotional context of the user's input and consistently responds in a considerate and empathetic manner. ###Feedback:"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

📚 詳細文檔

參考鏈接

主頁：https://github.com/kaistAI/Prometheus
倉庫：https://github.com/kaistAI/Prometheus
論文：https://arxiv.org/abs/2310.08491
聯繫人：seungone@kaist.ac.kr

模型詳情

模型描述

屬性	詳情
模型類型	語言模型
語言 (NLP)	英語
許可證	Apache 2.0
相關模型	[所有Prometheus檢查點](https://huggingface.co/models?search=kaist - ai/Prometheus)
更多信息資源	研究論文、GitHub倉庫

Prometheus有兩種不同大小的模型（7B和13B）。你可以在[此頁面](https://huggingface.co/kaist - ai/prometheus - 7b - v1.0)查看7B大小的語言模型。同時，也可以在[此頁面](https://huggingface.co/datasets/kaist - ai/Feedback - Collection)查看數據集。

提示格式

Prometheus的輸入需要包含4個組件：指令、待評估的回覆、評分標準和參考答案。你可以參考以下提示格式：

###任務描述:
給定一個指令（可能包含輸入）、一個待評估的回覆、一個得分為5的參考答案以及一個代表評估標準的評分標準。
1. 嚴格根據給定的評分標準，撰寫一份詳細的反饋，評估回覆的質量，而不是進行一般性評估。
2. 撰寫反饋後，給出一個1到5之間的整數評分。你應該參考評分標準。
3. 輸出格式應如下所示："反饋: (為標準撰寫的反饋) [RESULT] (1到5之間的整數)"
4. 請不要生成任何其他開頭、結尾和解釋。

###待評估的指令:
{instruction}

###待評估的回覆:
{response}

###參考答案 (得分5):
{reference_answer}

###評分標準:
[{criteria_description}]
得分1: {score1_description}
得分2: {score2_description}
得分3: {score3_description}
得分4: {score4_description}
得分5: {score5_description}

###反饋:

之後，你需要應用Llama - 2 - Chat的對話模板（不應用可能會導致意外行為）。你可以在[此鏈接](https://github.com/lm - sys/FastChat/blob/main/fastchat/conversation.py)找到對話類。

conv = get_conv_template("llama-2")
conv.set_system_message("You are a fair evaluator language model.")
conv.append_message(conv.roles[0], dialogs['instruction'])
conv.append_message(conv.roles[1], None)
prompt = conv.get_prompt()

x = tokenizer(prompt,truncation=False)

最終，會生成一個反饋和評分，以分隔短語[RESULT] 分隔。

許可證

反饋集合和Prometheus生成的數據需遵循OpenAI的使用條款。如果你懷疑有任何違規行為，請聯繫我們。

🔧 技術細節

Prometheus以[Llama - 2 - Chat](https://huggingface.co/meta - llama/Llama - 2 - 13b - chat - hf)為基礎模型，在[反饋集合](https://huggingface.co/datasets/kaist - ai/Feedback - Collection)中的100K反饋數據上進行了微調。通過附加2份參考材料（參考答案和定製評分標準），使得模型在評估長篇回覆方面表現出色。在各種基準測試中，其性能優於GPT - 3.5 - Turbo和Llama - 2 - Chat 70B，與GPT - 4相當。

📄 許可證

反饋集合和Prometheus生成的數據需遵循OpenAI的使用條款。如果你懷疑有任何違規行為，請聯繫我們。

📚 引用

如果你發現該模型有幫助，請考慮引用我們的論文！

BibTeX：

@misc{kim2023prometheus,
    title={Prometheus: Inducing Fine-grained Evaluation Capability in Language Models},
    author={Seungone Kim and Jamin Shin and Yejin Cho and Joel Jang and Shayne Longpre and Hwaran Lee and Sangdoo Yun and Seongjin Shin and Sungdong Kim and James Thorne and Minjoon Seo},
    year={2023},
    eprint={2310.08491},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}