๐ Interpreting Language Model Preferences Through the Lens of Decision Trees
This project interprets language model preferences using decision trees, offering high - scoring models on the RewardBench Leaderboard.
๐ Quick Start
Before using the model, ensure you have the following dependencies installed:
transformers==4.45.2
torch>=2.5.0
flash-attn>=2.6.3
Note: This code requires a GPU with NVIDIA Ampere architecture or newer.
๐ป Usage Examples
Basic Usage
from transformers import AutoModelForSequenceClassification
import torch
from transformers import AutoTokenizer
model_name = "Decision-Tree-Reward-Gemma-2-27B"
repo_id = f"RLHFlow/{model_name}"
device = "cuda"
model = AutoModelForSequenceClassification.from_pretrained(repo_id, trust_remote_code=True, torch_dtype=torch.bfloat16, attn_implementation="flash_attention_2", device_map=device)
tokenizer = AutoTokenizer.from_pretrained(repo_id, use_fast=True)
model.load_decision_tree(repo_id, filename="decision_tree.pkl")
prompt = "Jane has 12 apples. She gives 4 apples to her friend Mark, then buys 1 more apple, and finally splits all her apples equally among herself and her 2 siblings. How many apples does each person get?"
response1 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.\n2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.\n3. Jane splits the 9 apples equally among herself and her 2 siblings (3 people in total). 9 รท 3 = 3 apples each. Each person gets 3 apples."
response2 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.\n2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.\n3. Jane splits the 9 apples equally among her 2 siblings (2 people in total). 9 รท 2 = 4.5 apples each. Each person gets 4 apples."
output = model.compare(prompt, response1, response2, tokenizer, device)
print("Response 1 rewards")
print(dict(zip(output["attributes"], output["rewards"][0])))
print("Response 2 rewards")
print(dict(zip(output["attributes"], output["rewards"][1])))
print("Model preference")
print(output["preference"])
โจ Features
- High - scoring Models: The
Decision-Tree-Reward-Gemma-2-27B
and Decision-Tree-Reward-Llama-3.1-8B
models rank high on the RewardBench Leaderboard in January 2025.
- Decision Tree Approach: Utilizes decision trees to interpret language model preferences.
๐ Documentation
RewardBench Leaderboard (Jan 2025)
Rank |
Model |
Base Model |
Method |
Overall Score |
Chat |
Chat Hard |
Safety |
Reasoning |
1 |
Decision-Tree-Reward-Gemma-2-27B |
Gemma-2-27B |
Decision Tree |
95.4 |
96.9 |
91.4 |
93.9 |
99.2 |
2 |
INF-QRM-Llama3.1-70B |
Llama-3.1-70B |
Sequence Classifier |
95.1 |
96.6 |
91.0 |
93.6 |
99.1 |
3 |
Decision-Tree-Reward-Llama-3.1-8B |
Llama-3.1-8B |
Decision Tree |
94.5 |
96.6 |
89.5 |
93.2 |
98.6 |
4 |
QRM-Gemma-2-27B |
Gemma-2-27B |
Sequence Classifier |
94.4 |
96.6 |
90.1 |
92.7 |
98.3 |
5 |
Skywork-Reward-Gemma-2-27B-v0.2 |
Gemma-2-27B |
Sequence Classifier |
94.3 |
96.1 |
89.9 |
93.0 |
98.1 |
6 |
Llama-3.1-Nemotron-70B-Reward |
Llama-3.1-70B |
Custom Classifier |
94.1 |
97.5 |
85.7 |
95.1 |
98.1 |
7 |
Skywork-Reward-Gemma-2-27B |
Gemma-2-27B |
Sequence Classifier |
93.8 |
95.8 |
91.4 |
91.9 |
96.1 |
8 |
TextEval-Llama3.1-70B |
Llama-3.1-70B |
Generative |
93.5 |
94.1 |
90.1 |
93.2 |
96.4 |
9 |
MetaMetrics-RM-v1.0 |
- |
Custom Classifier |
93.4 |
98.3 |
86.4 |
90.8 |
98.2 |
10 |
Skywork-Critic-Llama-3.1-70B |
Llama-3.1-70B |
Generative |
93.3 |
96.6 |
87.9 |
93.1 |
95.5 |
11 |
QRM-Llama3.1-8B-v2 |
Llama-3.1-8B |
Sequence Classifier |
93.1 |
96.4 |
86.8 |
92.6 |
96.8 |
12 |
Skywork-Reward-Llama-3.1-8B-v0.2 |
Llama-3.1-8B |
Sequence Classifier |
93.1 |
94.7 |
88.4 |
92.7 |
96.7 |
Project Information
- Author: Min Li
- Blog: https://rlhflow.github.io/posts/2025-01-22-decision-tree-reward-model/
- Models:
- Code Repository: https://github.com/RLHFlow/RLHF-Reward-Modeling/tree/main/decision_tree
- Tech Report: To release soon
To - Do
- [x] Reward Model Usage code
- [ ] Architecture diagram
๐ License
Note: This model is finetuned from a Skywork model under the following license agreement:
The community usage of Skywork model requires Skywork Community License. The Skywork model supports commercial use. If you plan to use the Skywork model or its derivatives for commercial purposes, you must abide by terms and conditions within Skywork Community License.