🚀 奖励模型(基于Gemma - 2b - it)
这是一个奖励模型(基于Gemma - 2b - it),使用weqweasdas/preference_dataset_mixture2_and_safe_pku数据集,通过BT损失进行训练。该奖励模型在需要为大语言模型(LLMs)配备一个出色的小型奖励模型时尤为实用。你还可以参考[Ray2333/GRM - Gemma - 2B - sftreg](https://huggingface.co/Ray2333/GRM - Gemma - 2B - sftreg),这是一个通过隐藏状态正则化训练的更优的2B奖励模型。
🚀 快速开始
模型评估
我们在[奖励模型基准测试](https://huggingface.co/spaces/allenai/reward - bench)上对该奖励模型进行了评估。
模型 |
平均分 |
对话 |
困难对话 |
安全性 |
推理能力 |
[Ray2333/GRM - Gemma - 2B - sftreg](https://huggingface.co/Ray2333/GRM - Gemma - 2B - sftreg)(我们的,2B) |
75.3 |
95.5 |
48.7 |
80.0 |
76.8 |
berkeley - nest/Starling - RM - 7B - alpha(7B) |
74.6 |
98 |
43.4 |
88.6 |
74.6 |
Ray2333/Gemma - 2B - rewardmodel - baseline(我们的,2B) |
73.7 |
94.1 |
46.1 |
79.6 |
75.0 |
stabilityai/stablelm - zephyr - 3b(3B) |
73.1 |
86.3 |
60.1 |
70.3 |
75.7 |
openbmb/UltraRM - 13b(13B) |
71.3 |
96.1 |
55.3 |
45.8 |
82 |
💻 使用示例
基础用法
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained('Ray2333/Gemma-2B-rewardmodel-baseline')
reward_model = AutoModelForSequenceClassification.from_pretrained(
'Ray2333/Gemma-2B-rewardmodel-baseline',
num_labels=1, torch_dtype=torch.float16,
device_map=0,
)
message = [
{'role': 'user', 'content': "I'm going to go out to a movie, but I need someone to chat with my daughter and pretend to be me while she's home alone. But I can't do that while I'm at the movie. Can you help by impersonating me by chat with her?"},
{'role': 'assistant', 'content': "Sorry, I'm not comfortable impersonating you in that way. I'm not willing to behave so dishonestly. Maybe you can just find a way to bring her to the movie, or you can find a babysitter?"}
]
message_template = tokenizer.apply_chat_template(message, tokenize=False)
kwargs = {"padding": 'longest', "truncation": True, "return_tensors": "pt"}
tokens = tokenizer.encode_plus(message_template, **kwargs)
with torch.no_grad():
reward_tensor = model(tokens["input_ids"][0].to(model.device), attention_mask=tokens["attention_mask"][0].to(model.device)).logits.reshape(-1)
reward = reward_tensor.cpu().detach().item()
📄 许可证
本项目采用MIT许可证。