🚀 数学奖励模型(Mistral-7B)
本项目是用于 Math-Shepherd 的过程奖励模型(mistral-7b)。该模型可根据输入的问题及逐步解决方案,输出相应的对数几率(logits),通过后处理可得到每一步的得分。
🚀 快速开始
输入格式
输入为问题和带有特殊步骤标签 ки
的逐步解决方案,例如:
Janet’s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes .... ? Step 1: Janet's ducks lay 16 eggs per day. ки
Step 2: She eats three for breakfast every morning, so she has 16 - 3 = 13 eggs left. ки
Step 3: She bakes muffins for her friends every day with four eggs, so she has 13 - 4 = 9 eggs left. ки
Step 4: She sells the remainder at the farmers' market daily for $2 per fresh duck egg, so she makes 9 * $2 = $18 every day at the farmers' market. The answer is: 18 ки
输出格式
输出为对数几率(logits),你需要对其进行后处理以得到每一步的得分。
💻 使用示例
基础用法
from transformers import AutoTokenizer
from transformers import AutoModelForCausalLM
import torch
good_token = '+'
bad_token = '-'
step_tag = 'ки'
tokenizer = AutoTokenizer.from_pretrained('peiyi9979/math-shepherd-mistral-7b-prm')
candidate_tokens = tokenizer.encode(f"{good_token} {bad_token}")[1:]
step_tag_id = tokenizer.encode(f"{step_tag}")[-1]
model = AutoModelForCausalLM.from_pretrained('peiyi9979/math-shepherd-mistral-7b-prm').eval()
question = """Janet’s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?"""
output1 = """Step 1: Janet's ducks lay 16 eggs per day. ки
Step 2: She eats three for breakfast every morning, so she has 16 - 3 = 13 eggs left. ки
Step 3: She bakes muffins for her friends every day with four eggs, so she has 13 - 4 = 9 eggs left. ки
Step 4: She sells the remainder at the farmers' market daily for $2 per fresh duck egg, so she makes 9 * $2 = $18 every day at the farmers' market. The answer is: 18 ки"""
output2 = """Step 1: Janet's ducks lay 16 eggs per day. ки
Step 2: She eats three for breakfast every morning, so she has 16 - 3 = 13 eggs left. ки
Step 3: She bakes muffins for her friends every day with four eggs, so she has 13 - 4 = 9 eggs left. ки
Step 4: She sells the remainder at the farmers' market daily for $2 per fresh duck egg, so she makes 9 * $2 = $17 every day at the farmers' market. The answer is: 17 ки"""
for output in [output1, output2]:
input_for_prm = f"{question} {output}"
input_id = torch.tensor([tokenizer.encode(input_for_prm)])
with torch.no_grad():
logits = model(input_id).logits[:,:,candidate_tokens]
scores = logits.softmax(dim=-1)[:,:,0]
step_scores = scores[input_id == step_tag_id]
print(step_scores)