🚀 通过决策树解读语言模型偏好
本项目旨在通过决策树的视角来解读语言模型的偏好,为语言模型的评估和优化提供新的思路和方法。
🚀 快速开始
在使用模型之前,请确保你已经安装了以下依赖:
transformers==4.45.2
torch>=2.5.0
flash-attn>=2.6.3
注意:此代码需要使用具有NVIDIA Ampere架构或更新版本的GPU。
💻 使用示例
基础用法
from transformers import AutoModelForSequenceClassification
import torch
from transformers import AutoTokenizer
model_name = "Decision-Tree-Reward-Gemma-2-27B"
repo_id = f"RLHFlow/{model_name}"
device = "cuda"
model = AutoModelForSequenceClassification.from_pretrained(repo_id, trust_remote_code=True, torch_dtype=torch.bfloat16, attn_implementation="flash_attention_2", device_map=device)
tokenizer = AutoTokenizer.from_pretrained(repo_id, use_fast=True)
model.load_decision_tree(repo_id, filename="decision_tree.pkl")
prompt = "Jane has 12 apples. She gives 4 apples to her friend Mark, then buys 1 more apple, and finally splits all her apples equally among herself and her 2 siblings. How many apples does each person get?"
response1 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.\n2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.\n3. Jane splits the 9 apples equally among herself and her 2 siblings (3 people in total). 9 ÷ 3 = 3 apples each. Each person gets 3 apples."
response2 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.\n2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.\n3. Jane splits the 9 apples equally among her 2 siblings (2 people in total). 9 ÷ 2 = 4.5 apples each. Each person gets 4 apples."
output = model.compare(prompt, response1, response2, tokenizer, device)
print("Response 1 rewards")
print(dict(zip(output["attributes"], output["rewards"][0])))
print("Response 2 rewards")
print(dict(zip(output["attributes"], output["rewards"][1])))
print("Model preference")
print(output["preference"])
✨ 主要特性
- 模型排名展示:在RewardBench排行榜(2025年1月)中展示了不同模型的排名情况,包括整体得分以及在各个指标下的得分。
- 代码示例:提供了详细的使用代码示例,方便用户快速上手使用模型进行响应比较和奖励评估。
📚 详细文档
作者信息
- 作者:Min Li
- 博客:https://rlhflow.github.io/posts/2025-01-22-decision-tree-reward-model/
模型信息
代码仓库
- 代码仓库地址:https://github.com/RLHFlow/RLHF-Reward-Modeling/tree/main/decision_tree
技术报告
RewardBench排行榜(2025年1月)
排名 |
模型 |
基础模型 |
方法 |
整体得分 |
对话 |
困难对话 |
安全性 |
推理能力 |
1 |
Decision-Tree-Reward-Gemma-2-27B |
Gemma-2-27B |
决策树 |
95.4 |
96.9 |
91.4 |
93.9 |
99.2 |
2 |
INF-QRM-Llama3.1-70B |
Llama-3.1-70B |
序列分类器 |
95.1 |
96.6 |
91.0 |
93.6 |
99.1 |
3 |
Decision-Tree-Reward-Llama-3.1-8B |
Llama-3.1-8B |
决策树 |
94.5 |
96.6 |
89.5 |
93.2 |
98.6 |
4 |
QRM-Gemma-2-27B |
Gemma-2-27B |
序列分类器 |
94.4 |
96.6 |
90.1 |
92.7 |
98.3 |
5 |
Skywork-Reward-Gemma-2-27B-v0.2 |
Gemma-2-27B |
序列分类器 |
94.3 |
96.1 |
89.9 |
93.0 |
98.1 |
6 |
Llama-3.1-Nemotron-70B-Reward |
Llama-3.1-70B |
自定义分类器 |
94.1 |
97.5 |
85.7 |
95.1 |
98.1 |
7 |
Skywork-Reward-Gemma-2-27B |
Gemma-2-27B |
序列分类器 |
93.8 |
95.8 |
91.4 |
91.9 |
96.1 |
8 |
TextEval-Llama3.1-70B |
Llama-3.1-70B |
生成式 |
93.5 |
94.1 |
90.1 |
93.2 |
96.4 |
9 |
MetaMetrics-RM-v1.0 |
- |
自定义分类器 |
93.4 |
98.3 |
86.4 |
90.8 |
98.2 |
10 |
Skywork-Critic-Llama-3.1-70B |
Llama-3.1-70B |
生成式 |
93.3 |
96.6 |
87.9 |
93.1 |
95.5 |
11 |
QRM-Llama3.1-8B-v2 |
Llama-3.1-8B |
序列分类器 |
93.1 |
96.4 |
86.8 |
92.6 |
96.8 |
12 |
Skywork-Reward-Llama-3.1-8B-v0.2 |
Llama-3.1-8B |
序列分类器 |
93.1 |
94.7 |
88.4 |
92.7 |
96.7 |
📄 许可证
⚠️ 重要提示
此模型是基于Skywork模型进行微调的,其使用需遵循以下许可协议:Skywork模型的社区使用需要遵循Skywork社区许可协议。Skywork模型支持商业使用。如果您计划将Skywork模型或其衍生模型用于商业目的,则必须遵守Skywork社区许可协议中的条款和条件。
📋 待办事项