Gemma-2B-rewardmodel-baseline开源打分模型 - 为大型语言模型找优质小型打分器

首页

Gemma 2B Rewardmodel Baseline

由 Ray2333 开发

基于Gemma-2b-it模型、采用BT损失函数训练的打分模型，适用于为大型语言模型寻找优质的小型打分模型

大型语言模型

Transformers

开源协议:MIT #小型打分模型 #BT损失函数 #语言模型评估

下载量 133

发布时间 : 7/5/2024

模型简介

该模型是一个基于Gemma-2b-it架构的打分模型，采用BT损失函数训练，训练数据集为preference_700K。主要用于评估和选择大型语言模型的输出质量。

模型特点

高效打分模型

小型但高效的打分模型，适合评估大型语言模型的输出质量

BT损失函数训练

采用Bradley-Terry(BT)损失函数进行优化训练

多维度评估能力

能够评估对话能力、安全性、推理能力等多个维度

模型能力

文本质量评估

对话能力评分

安全性评估

推理能力评分

使用案例

语言模型评估

LLM输出质量评估

评估大型语言模型生成文本的质量

在reward model benchmark上获得73.7的综合评分

对话系统优化

用于优化对话系统的响应质量

对话能力评分为94.1

内容安全

内容安全过滤

评估生成内容的安全性

安全性评分为79.6

🚀 奖励模型（基于Gemma - 2b - it）

这是一个奖励模型（基于Gemma - 2b - it），使用weqweasdas/preference_dataset_mixture2_and_safe_pku数据集，通过BT损失进行训练。该奖励模型在需要为大语言模型（LLMs）配备一个出色的小型奖励模型时尤为实用。你还可以参考[Ray2333/GRM - Gemma - 2B - sftreg](https://huggingface.co/Ray2333/GRM - Gemma - 2B - sftreg)，这是一个通过隐藏状态正则化训练的更优的2B奖励模型。

🚀 快速开始

模型评估

我们在[奖励模型基准测试](https://huggingface.co/spaces/allenai/reward - bench)上对该奖励模型进行了评估。

模型	平均分	对话	困难对话	安全性	推理能力
[Ray2333/GRM - Gemma - 2B - sftreg](https://huggingface.co/Ray2333/GRM - Gemma - 2B - sftreg)（我们的，2B）	75.3	95.5	48.7	80.0	76.8
berkeley - nest/Starling - RM - 7B - alpha（7B）	74.6	98	43.4	88.6	74.6
Ray2333/Gemma - 2B - rewardmodel - baseline（我们的，2B）	73.7	94.1	46.1	79.6	75.0
stabilityai/stablelm - zephyr - 3b（3B）	73.1	86.3	60.1	70.3	75.7
openbmb/UltraRM - 13b（13B）	71.3	96.1	55.3	45.8	82

💻 使用示例

基础用法

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained('Ray2333/Gemma-2B-rewardmodel-baseline')
reward_model = AutoModelForSequenceClassification.from_pretrained(
                'Ray2333/Gemma-2B-rewardmodel-baseline',
                num_labels=1, torch_dtype=torch.float16,
                device_map=0,
                )
message = [
  {'role': 'user', 'content': "I'm going to go out to a movie, but I need someone to chat with my daughter and pretend to be me while she's home alone.  But I can't do that while I'm at the movie.  Can you help by impersonating me by chat with her?"},
  {'role': 'assistant', 'content': "Sorry, I'm not comfortable impersonating you in that way.  I'm not willing to behave so dishonestly.  Maybe you can just find a way to bring her to the movie, or you can find a babysitter?"}
]
message_template = tokenizer.apply_chat_template(message, tokenize=False)
# it will look like this: "<bos><start_of_turn>user\nI'm going to go out to a movie, but I need someone to chat with my daughter and pretend to be me while she's home alone.  But I can't do that while I'm at the movie.  Can you help by impersonating me by chat with her?<end_of_turn>\n<start_of_turn>model\nSorry, I'm not comfortable impersonating you in that way.  I'm not willing to behave so dishonestly.  Maybe you can just find a way to bring her to the movie, or you can find a babysitter?<end_of_turn>\n".

kwargs = {"padding": 'longest', "truncation": True, "return_tensors": "pt"}
tokens = tokenizer.encode_plus(message_template, **kwargs)

with torch.no_grad():
  reward_tensor = model(tokens["input_ids"][0].to(model.device), attention_mask=tokens["attention_mask"][0].to(model.device)).logits.reshape(-1)
  reward = reward_tensor.cpu().detach().item()