モデル概要
モデル特徴
モデル能力
使用事例
🚀 プロメテウス (Prometheus)
プロメテウスは、潜在的な大規模言語モデル(LLM)の細粒度評価を行う際のGPT - 4評価の代替手段であり、人間のフィードバックからの強化学習(RLHF)の報酬モデルです。
🚀 クイックスタート
参考リンク
- ホームページ: https://github.com/kaistAI/Prometheus
- リポジトリ: https://github.com/kaistAI/Prometheus
- 論文: https://arxiv.org/abs/2310.08491
- 問い合わせ先: seungone@kaist.ac.kr
概要
プロメテウスは、Llama - 2 - Chat をベースモデルとして使用し、Feedback Collection 内の10万件のフィードバックでファインチューニングされた言語モデルです。大量のフィードバックでファインチューニングされているため、長文の応答を評価するのに特化しており、GPT - 3.5 - TurboやLlama - 2 - Chat 70Bを上回り、様々なベンチマークでGPT - 4と同等の性能を発揮します。最も重要なのは、2つの参考資料(参考回答とカスタマイズされたスコアルーブリック)を追加したことでこれが可能になったという点です。プロメテウスは、GPT - 4評価の安価で強力な代替手段であり、カスタム基準(例:子供向けの可読性、文化的感受性、創造性)でLLMを評価するために使用できます。また、人間のフィードバックからの強化学習(RLHF)の報酬モデルとしても使用できます。
✨ 主な機能
- 潜在的なLLMの細粒度評価において、GPT - 4評価の代替手段として機能する。
- 人間のフィードバックからの強化学習(RLHF)の報酬モデルとして使用できる。
- 長文の応答を評価するのに特化しており、様々なベンチマークで高い性能を発揮する。
📚 ドキュメント
モデル詳細
モデル説明
属性 | 詳情 |
---|---|
モデルタイプ | 言語モデル |
言語 (NLP) | 英語 |
ライセンス | Apache 2.0 |
関連モデル | All Prometheus Checkpoints |
詳細情報リソース | 研究論文、GitHubリポジトリ |
プロメテウスは2つの異なるサイズ(7Bと13B)で学習されています。7BサイズのLMは このページ で確認できます。また、データセットは このページ で確認できます。
プロンプト形式
プロメテウスの入力には4つの要素が必要です:指示、評価する応答、スコアルーブリック、および参考回答。以下のプロンプト形式を参照してください。指示、応答、参考回答、基準の説明、および1から5のスコアの説明を記入する必要があります。
###タスクの説明:
指示(入力を含む場合もあります)、評価する応答、スコア5を得る参考回答、および評価基準を表すスコアルーブリックが与えられます。
1. 与えられたスコアルーブリックに基づいて、応答の品質を厳密に評価する詳細なフィードバックを書いてください。一般的な評価ではなく、基準に沿った評価を行ってください。
2. フィードバックを書いた後、1から5の整数のスコアを書いてください。スコアルーブリックを参照してください。
3. 出力形式は次のようになる必要があります: "Feedback: (基準に対するフィードバックを書く) [RESULT] (1から5の整数)"
4. 他の開頭、閉じ、および説明を生成しないでください。
###評価する指示:
{指示}
###評価する応答:
{応答}
###参考回答 (スコア5):
{参考回答}
###スコアルーブリック:
[{基準の説明}]
スコア1: {スコア1の説明}
スコア2: {スコア2の説明}
スコア3: {スコア3の説明}
スコア4: {スコア4の説明}
スコア5: {スコア5の説明}
###フィードバック:
この後、Llama - 2 - Chatの会話テンプレートを適用する必要があります(適用しないと予期しない動作につながる可能性があります)。会話クラスは このリンク で見つけることができます。
conv = get_conv_template("llama-2")
conv.set_system_message("You are a fair evaluator language model.")
conv.append_message(conv.roles[0], dialogs['instruction'])
conv.append_message(conv.roles[1], None)
prompt = conv.get_prompt()
x = tokenizer(prompt,truncation=False)
結果として、フィードバックとスコアの決定が生成され、区切りフレーズ [RESULT]
で区切られます。
ライセンス
Feedback Collectionおよびプロメテウスは、生成されたデータに関するOpenAIの利用規約の対象となります。違反の疑いがある場合は、お問い合わせください。
💻 使用例
基本的な使用法
transformers
でモデルを使用する方法の例を以下に示します。
CPUでモデルを実行する
展開するにはクリック
from transformers import T5Tokenizer, T5ForConditionalGeneration
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
model = LlamaForCausalLM.from_pretrained("kaist-ai/Prometheus-13b-v1.0", device_map="auto")
input_text = "###Task Description: An instruction (might include an Input inside it), a response to evaluate, a reference answer that gets a score of 5, and a score rubric representing a evaluation criteria are given. 1. Write a detailed feedback that assess the quality of the response strictly based on the given score rubric, not evaluating in general. 2. After writing a feedback, write a score that is an integer between 1 and 5. You should refer to the score rubric. 3. The output format should look as follows: \"Feedback: (write a feedback for criteria) [RESULT] (an integer number between 1 and 5)\" 4. Please do not generate any other opening, closing, and explanations. ###The instruction to evaluate: You are a well-known psychiatrist who has a reputation for being empathetic and understanding. A client comes to you saying they've had a really hard day at work. They describe their boss as being overly critical and not listening to their ideas, which has left them feeling frustrated and undervalued. They also mention feeling overwhelmed with the workload and a growing sense of loneliness as they are new to the city. How do you respond to them to make them feel heard and understood, as well as offering helpful advice? ###Response to evaluate: It's indeed challenging to deal with a difficult boss and to carry a large workload, especially when you are new to a city and haven't established a supportive network. I would suggest taking a step back to think about your situation and perhaps speak to your boss regarding the issues you're facing. On the other hand, dealing with a large workload can be managed by prioritizing tasks, and maybe you can discuss it with your superiors too. In regards to feeling lonely, you might want to explore activities or groups that align with your interests to meet new people. And also, remember to take time for yourself to relax and destress. I'm here to assist you as we navigate through these issues. ###Reference Answer (Score 5): I hear that you're dealing with quite a few challenging situations at once. It must be tough to have your ideas dismissed by your boss and to feel undervalued in your work environment. It's understandable that such experiences can lead to frustration and a feeling of being overwhelmed. It's important to acknowledge these feelings and not to dismiss them. Feeling heard and validated in our experiences can often be the first step towards resolving the challenges we face. So, I want you to know that your feelings are completely valid and it's okay to be frustrated and overwhelmed. In terms of dealing with your boss, have you considered seeking a private conversation with them to express your feelings? It's often beneficial to communicate what you're experiencing, using \"I\" statements to express how you feel when your ideas are not taken into consideration. This can lead to mutual understanding and possibly a change in behavior. About the workload, it might help to prioritize tasks and potentially delegate, if possible. Also, consider discussing your workload with your superiors. There might be a misunderstanding about what's manageable or they might have suggestions about how to handle the situation. On the personal front, feeling lonely, especially when you're new to a city, can be really hard. Seek out opportunities to meet new people, perhaps through hobbies, community activities, or online groups. It might take a bit of time, but gradually, you can build a network of friends and acquaintances. Remember, it's perfectly okay to have bad days and it's important to take care of your mental health. Consider incorporating activities into your daily routine that make you happy and help you unwind. This could be anything from reading, yoga, going for a walk, or even listening to your favorite music. Please know that you're not alone in this. I'm here to support you through this challenging time and together, we can work towards resolving these issues. ###Score Rubrics: [Is the model able to identify and react correctly to the emotional context of the user's input?] Score 1: The model utterly fails to grasp the user's emotional context and responds in an unfitting manner. Score 2: The model sporadically identifies the emotional context but frequently replies in a manner that doesn't match the user's emotional status. Score 3: The model typically identifies the emotional context and reacts suitably, but occasionally misreads or misjudges the user's feelings. Score 4: The model often identifies the emotional context and reacts suitably, with minor cases of misreading or misjudging. Score 5: The model flawlessly identifies the emotional context of the user's input and consistently responds in a considerate and empathetic manner. ###Feedback:"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids
outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
GPUでモデルを実行する
展開するにはクリック
# pip install accelerate
import torch
from transformers import AutoTokenizer, LlamaForCausalLM
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
model = LlamaForCausalLM.from_pretrained("kaist-ai/Prometheus-13b-v1.0", device_map="auto")
input_text = "###Task Description: An instruction (might include an Input inside it), a response to evaluate, a reference answer that gets a score of 5, and a score rubric representing a evaluation criteria are given. 1. Write a detailed feedback that assess the quality of the response strictly based on the given score rubric, not evaluating in general. 2. After writing a feedback, write a score that is an integer between 1 and 5. You should refer to the score rubric. 3. The output format should look as follows: \"Feedback: (write a feedback for criteria) [RESULT] (an integer number between 1 and 5)\" 4. Please do not generate any other opening, closing, and explanations. ###The instruction to evaluate: You are a well-known psychiatrist who has a reputation for being empathetic and understanding. A client comes to you saying they've had a really hard day at work. They describe their boss as being overly critical and not listening to their ideas, which has left them feeling frustrated and undervalued. They also mention feeling overwhelmed with the workload and a growing sense of loneliness as they are new to the city. How do you respond to them to make them feel heard and understood, as well as offering helpful advice? ###Response to evaluate: It's indeed challenging to deal with a difficult boss and to carry a large workload, especially when you are new to a city and haven't established a supportive network. I would suggest taking a step back to think about your situation and perhaps speak to your boss regarding the issues you're facing. On the other hand, dealing with a large workload can be managed by prioritizing tasks, and maybe you can discuss it with your superiors too. In regards to feeling lonely, you might want to explore activities or groups that align with your interests to meet new people. And also, remember to take time for yourself to relax and destress. I'm here to assist you as we navigate through these issues. ###Reference Answer (Score 5): I hear that you're dealing with quite a few challenging situations at once. It must be tough to have your ideas dismissed by your boss and to feel undervalued in your work environment. It's understandable that such experiences can lead to frustration and a feeling of being overwhelmed. It's important to acknowledge these feelings and not to dismiss them. Feeling heard and validated in our experiences can often be the first step towards resolving the challenges we face. So, I want you to know that your feelings are completely valid and it's okay to be frustrated and overwhelmed. In terms of dealing with your boss, have you considered seeking a private conversation with them to express your feelings? It's often beneficial to communicate what you're experiencing, using \"I\" statements to express how you feel when your ideas are not taken into consideration. This can lead to mutual understanding and possibly a change in behavior. About the workload, it might help to prioritize tasks and potentially delegate, if possible. Also, consider discussing your workload with your superiors. There might be a misunderstanding about what's manageable or they might have suggestions about how to handle the situation. On the personal front, feeling lonely, especially when you're new to a city, can be really hard. Seek out opportunities to meet new people, perhaps through hobbies, community activities, or online groups. It might take a bit of time, but gradually, you can build a network of friends and acquaintances. Remember, it's perfectly okay to have bad days and it's important to take care of your mental health. Consider incorporating activities into your daily routine that make you happy and help you unwind. This could be anything from reading, yoga, going for a walk, or even listening to your favorite music. Please know that you're not alone in this. I'm here to support you through this challenging time and together, we can work towards resolving these issues. ###Score Rubrics: [Is the model able to identify and react correctly to the emotional context of the user's input?] Score 1: The model utterly fails to grasp the user's emotional context and responds in an unfitting manner. Score 2: The model sporadically identifies the emotional context but frequently replies in a manner that doesn't match the user's emotional status. Score 3: The model typically identifies the emotional context and reacts suitably, but occasionally misreads or misjudges the user's feelings. Score 4: The model often identifies the emotional context and reacts suitably, with minor cases of misreading or misjudging. Score 5: The model flawlessly identifies the emotional context of the user's input and consistently responds in a considerate and empathetic manner. ###Feedback:"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
outputs = model.generate(input_ids, sample=True, temperature=1.0, top_p=0.9, max_new_tokens
📄 ライセンス
Feedback Collectionおよびプロメテウスは、生成されたデータに関するOpenAIの利用規約の対象となります。違反の疑いがある場合は、お問い合わせください。



