🚀 通過決策樹解讀語言模型偏好
本項目旨在通過決策樹的視角來解讀語言模型的偏好,為語言模型的評估和優化提供新的思路和方法。
🚀 快速開始
在使用模型之前,請確保你已經安裝了以下依賴:
transformers==4.45.2
torch>=2.5.0
flash-attn>=2.6.3
注意:此代碼需要使用具有NVIDIA Ampere架構或更新版本的GPU。
💻 使用示例
基礎用法
from transformers import AutoModelForSequenceClassification
import torch
from transformers import AutoTokenizer
model_name = "Decision-Tree-Reward-Gemma-2-27B"
repo_id = f"RLHFlow/{model_name}"
device = "cuda"
model = AutoModelForSequenceClassification.from_pretrained(repo_id, trust_remote_code=True, torch_dtype=torch.bfloat16, attn_implementation="flash_attention_2", device_map=device)
tokenizer = AutoTokenizer.from_pretrained(repo_id, use_fast=True)
model.load_decision_tree(repo_id, filename="decision_tree.pkl")
prompt = "Jane has 12 apples. She gives 4 apples to her friend Mark, then buys 1 more apple, and finally splits all her apples equally among herself and her 2 siblings. How many apples does each person get?"
response1 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.\n2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.\n3. Jane splits the 9 apples equally among herself and her 2 siblings (3 people in total). 9 ÷ 3 = 3 apples each. Each person gets 3 apples."
response2 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.\n2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.\n3. Jane splits the 9 apples equally among her 2 siblings (2 people in total). 9 ÷ 2 = 4.5 apples each. Each person gets 4 apples."
output = model.compare(prompt, response1, response2, tokenizer, device)
print("Response 1 rewards")
print(dict(zip(output["attributes"], output["rewards"][0])))
print("Response 2 rewards")
print(dict(zip(output["attributes"], output["rewards"][1])))
print("Model preference")
print(output["preference"])
✨ 主要特性
- 模型排名展示:在RewardBench排行榜(2025年1月)中展示了不同模型的排名情況,包括整體得分以及在各個指標下的得分。
- 代碼示例:提供了詳細的使用代碼示例,方便用戶快速上手使用模型進行響應比較和獎勵評估。
📚 詳細文檔
作者信息
- 作者:Min Li
- 博客:https://rlhflow.github.io/posts/2025-01-22-decision-tree-reward-model/
模型信息
代碼倉庫
- 代碼倉庫地址:https://github.com/RLHFlow/RLHF-Reward-Modeling/tree/main/decision_tree
技術報告
RewardBench排行榜(2025年1月)
排名 |
模型 |
基礎模型 |
方法 |
整體得分 |
對話 |
困難對話 |
安全性 |
推理能力 |
1 |
Decision-Tree-Reward-Gemma-2-27B |
Gemma-2-27B |
決策樹 |
95.4 |
96.9 |
91.4 |
93.9 |
99.2 |
2 |
INF-QRM-Llama3.1-70B |
Llama-3.1-70B |
序列分類器 |
95.1 |
96.6 |
91.0 |
93.6 |
99.1 |
3 |
Decision-Tree-Reward-Llama-3.1-8B |
Llama-3.1-8B |
決策樹 |
94.5 |
96.6 |
89.5 |
93.2 |
98.6 |
4 |
QRM-Gemma-2-27B |
Gemma-2-27B |
序列分類器 |
94.4 |
96.6 |
90.1 |
92.7 |
98.3 |
5 |
Skywork-Reward-Gemma-2-27B-v0.2 |
Gemma-2-27B |
序列分類器 |
94.3 |
96.1 |
89.9 |
93.0 |
98.1 |
6 |
Llama-3.1-Nemotron-70B-Reward |
Llama-3.1-70B |
自定義分類器 |
94.1 |
97.5 |
85.7 |
95.1 |
98.1 |
7 |
Skywork-Reward-Gemma-2-27B |
Gemma-2-27B |
序列分類器 |
93.8 |
95.8 |
91.4 |
91.9 |
96.1 |
8 |
TextEval-Llama3.1-70B |
Llama-3.1-70B |
生成式 |
93.5 |
94.1 |
90.1 |
93.2 |
96.4 |
9 |
MetaMetrics-RM-v1.0 |
- |
自定義分類器 |
93.4 |
98.3 |
86.4 |
90.8 |
98.2 |
10 |
Skywork-Critic-Llama-3.1-70B |
Llama-3.1-70B |
生成式 |
93.3 |
96.6 |
87.9 |
93.1 |
95.5 |
11 |
QRM-Llama3.1-8B-v2 |
Llama-3.1-8B |
序列分類器 |
93.1 |
96.4 |
86.8 |
92.6 |
96.8 |
12 |
Skywork-Reward-Llama-3.1-8B-v0.2 |
Llama-3.1-8B |
序列分類器 |
93.1 |
94.7 |
88.4 |
92.7 |
96.7 |
📄 許可證
⚠️ 重要提示
此模型是基於Skywork模型進行微調的,其使用需遵循以下許可協議:Skywork模型的社區使用需要遵循Skywork社區許可協議。Skywork模型支持商業使用。如果您計劃將Skywork模型或其衍生模型用於商業目的,則必須遵守Skywork社區許可協議中的條款和條件。
📋 待辦事項