🚀 プロセス報酬モデル (mistral - 7b)
Math - Shepherd で使用されるプロセス報酬モデル (mistral - 7b) です。
🚀 クイックスタート
入力
質問と、特殊なステップタグ ки
を含む段階的な解決策を入力します。例えば、
Janetのアヒルは1日に16個の卵を産みます。彼女は毎朝朝食に3個食べ、... ? Step 1: Janetのアヒルは1日に16個の卵を産みます。 ки
Step 2: 彼女は毎朝朝食に3個食べるので、残りは16 - 3 = 13個の卵です。 ки
Step 3: 彼女は毎日4個の卵で友達のためにマフィンを焼くので、残りは13 - 4 = 9個の卵です。 ки
Step 4: 彼女は残りの卵を毎日農産物市場で新鮮なアヒルの卵1個につき2ドルで売るので、農産物市場で毎日9 * 2 = 18ドルを稼ぎます。答えは: 18 ки
出力
ロジットが出力されます。これをポストプロセスして各ステップのスコアを取得する必要があります。
💻 使用例
基本的な使用法
from transformers import AutoTokenizer
from transformers import AutoModelForCausalLM
import torch
good_token = '+'
bad_token = '-'
step_tag = 'ки'
tokenizer = AutoTokenizer.from_pretrained('peiyi9979/math-shepherd-mistral-7b-prm')
candidate_tokens = tokenizer.encode(f"{good_token} {bad_token}")[1:]
step_tag_id = tokenizer.encode(f"{step_tag}")[-1]
model = AutoModelForCausalLM.from_pretrained('peiyi9979/math-shepherd-mistral-7b-prm').eval()
question = """Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?"""
output1 = """Step 1: Janet's ducks lay 16 eggs per day. ки\nStep 2: She eats three for breakfast every morning, so she has 16 - 3 = 13 eggs left. ки\nStep 3: She bakes muffins for her friends every day with four eggs, so she has 13 - 4 = 9 eggs left. ки\nStep 4: She sells the remainder at the farmers' market daily for $2 per fresh duck egg, so she makes 9 * $2 = $18 every day at the farmers' market. The answer is: 18 ки"""
output2 = """Step 1: Janet's ducks lay 16 eggs per day. ки\nStep 2: She eats three for breakfast every morning, so she has 16 - 3 = 13 eggs left. ки\nStep 3: She bakes muffins for her friends every day with four eggs, so she has 13 - 4 = 9 eggs left. ки\nStep 4: She sells the remainder at the farmers' market daily for $2 per fresh duck egg, so she makes 9 * $2 = $17 every day at the farmers' market. The answer is: 17 ки"""
for output in [output1, output2]:
input_for_prm = f"{question} {output}"
input_id = torch.tensor([tokenizer.encode(input_for_prm)])
with torch.no_grad():
logits = model(input_id).logits[:,:,candidate_tokens]
scores = logits.softmax(dim=-1)[:,:,0]
step_scores = scores[input_id == step_tag_id]
print(step_scores)