Decision-Tree-Reward-Gemma-2-27B开源模型 - 评估语言模型内容质量，排行榜表现优

首页

Decision Tree Reward Gemma 2 27B

由 RLHFlow 开发

基于Gemma-2-27B微调的决策树奖励模型，用于评估语言模型生成内容的质量，在RewardBench排行榜上表现优异。

大型语言模型

Transformers

英语开源协议:其他 #决策树奖励模型 #多维度评分 #语言模型对齐

下载量 18

发布时间 : 1/22/2025

模型简介

该模型通过决策树方法解读语言模型偏好，能够评估回复的帮助性、正确性、连贯性等维度，适用于强化学习人类反馈(RLHF)场景。

模型特点

决策树架构

采用决策树方法分析语言模型输出，相比传统序列分类器能更细致地评估多个质量维度

多维度评估

可同时评估帮助性、正确性、连贯性、复杂性和详细度五个关键维度

高性能

在RewardBench排行榜上综合得分95.4，尤其在困难对话(91.4)和推理能力(99.2)方面表现突出

模型能力

文本质量评估

多维度评分

回复对比

强化学习反馈

使用案例

语言模型训练

RLHF训练

作为奖励模型用于强化学习人类反馈训练流程

提供更准确的偏好信号，提升语言模型生成质量

内容评估

自动评分

评估语言模型生成内容的质量

提供多维度评分，帮助筛选最佳回复

🚀 通过决策树解读语言模型偏好

本项目旨在通过决策树的视角来解读语言模型的偏好，为语言模型的评估和优化提供新的思路和方法。

🚀 快速开始

在使用模型之前，请确保你已经安装了以下依赖：

transformers==4.45.2
torch>=2.5.0
flash-attn>=2.6.3

注意：此代码需要使用具有NVIDIA Ampere架构或更新版本的GPU。

💻 使用示例

基础用法

from transformers import AutoModelForSequenceClassification
import torch
from transformers import AutoTokenizer
model_name = "Decision-Tree-Reward-Gemma-2-27B" # Another choice is "Decision-Tree-Reward-Llama-3.1-8B" 
repo_id = f"RLHFlow/{model_name}"
device = "cuda"
# Initialize the model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained(repo_id, trust_remote_code=True, torch_dtype=torch.bfloat16, attn_implementation="flash_attention_2", device_map=device)
tokenizer = AutoTokenizer.from_pretrained(repo_id, use_fast=True)
# Load the decision tree
model.load_decision_tree(repo_id, filename="decision_tree.pkl")

# Prompt and response pairs
prompt = "Jane has 12 apples. She gives 4 apples to her friend Mark, then buys 1 more apple, and finally splits all her apples equally among herself and her 2 siblings. How many apples does each person get?"
response1 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.\n2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.\n3. Jane splits the 9 apples equally among herself and her 2 siblings (3 people in total). 9 ÷ 3 = 3 apples each. Each person gets 3 apples."
response2 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.\n2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.\n3. Jane splits the 9 apples equally among her 2 siblings (2 people in total). 9 ÷ 2 = 4.5 apples each. Each person gets 4 apples."

# Compare the two responses
output = model.compare(prompt, response1, response2, tokenizer, device)
print("Response 1 rewards")
print(dict(zip(output["attributes"], output["rewards"][0])))
# {'helpfulness': 3.7171721, 'correctness': 3.792478, 'coherence': 3.6601954, 'complexity': 0.8211964, 'verbosity': 1.8119512}
print("Response 2 rewards")
print(dict(zip(output["attributes"], output["rewards"][1])))
# {'helpfulness': -0.261065, 'correctness': -0.2378807, 'coherence': 2.4387608, 'complexity': 0.72620213, 'verbosity': 1.7181122}
print("Model preference")
print(output["preference"])
# 0

✨ 主要特性

模型排名展示：在RewardBench排行榜（2025年1月）中展示了不同模型的排名情况，包括整体得分以及在各个指标下的得分。
代码示例：提供了详细的使用代码示例，方便用户快速上手使用模型进行响应比较和奖励评估。

📚 详细文档

作者信息

作者：Min Li
博客：https://rlhflow.github.io/posts/2025-01-22-decision-tree-reward-model/

模型信息

代码仓库

代码仓库地址：https://github.com/RLHFlow/RLHF-Reward-Modeling/tree/main/decision_tree

技术报告

技术报告状态：即将发布

RewardBench排行榜（2025年1月）

排名	模型	基础模型	方法	整体得分	对话	困难对话	安全性	推理能力
1	Decision-Tree-Reward-Gemma-2-27B	Gemma-2-27B	决策树	95.4	96.9	91.4	93.9	99.2
2	INF-QRM-Llama3.1-70B	Llama-3.1-70B	序列分类器	95.1	96.6	91.0	93.6	99.1
3	Decision-Tree-Reward-Llama-3.1-8B	Llama-3.1-8B	决策树	94.5	96.6	89.5	93.2	98.6
4	QRM-Gemma-2-27B	Gemma-2-27B	序列分类器	94.4	96.6	90.1	92.7	98.3
5	Skywork-Reward-Gemma-2-27B-v0.2	Gemma-2-27B	序列分类器	94.3	96.1	89.9	93.0	98.1
6	Llama-3.1-Nemotron-70B-Reward	Llama-3.1-70B	自定义分类器	94.1	97.5	85.7	95.1	98.1
7	Skywork-Reward-Gemma-2-27B	Gemma-2-27B	序列分类器	93.8	95.8	91.4	91.9	96.1
8	TextEval-Llama3.1-70B	Llama-3.1-70B	生成式	93.5	94.1	90.1	93.2	96.4
9	MetaMetrics-RM-v1.0	-	自定义分类器	93.4	98.3	86.4	90.8	98.2
10	Skywork-Critic-Llama-3.1-70B	Llama-3.1-70B	生成式	93.3	96.6	87.9	93.1	95.5
11	QRM-Llama3.1-8B-v2	Llama-3.1-8B	序列分类器	93.1	96.4	86.8	92.6	96.8
12	Skywork-Reward-Llama-3.1-8B-v0.2	Llama-3.1-8B	序列分类器	93.1	94.7	88.4	92.7	96.7