Math-shepherd-mistral-7b-prm Open-source Model - Extremely Practical for Evaluating the Correctness of Math Problem-solving Steps

Math Shepherd Mistral 7b Prm

Developed by peiyi9979

A process reward model fine-tuned based on Mistral-7B, used to evaluate the correctness of mathematical problem-solving steps

Large Language Model

Transformers

#Mathematical Reasoning Evaluation #Step-by-step Logic Verification #Process Reward Model

Downloads 3,536

Release Time : 1/3/2024

Model Overview

This model is part of the Math-Shepherd project, specifically designed to score each step in mathematical problem-solving processes. It identifies steps through special markers and outputs logical judgments on their correctness.

Model Features

Step-level Evaluation

Uses special markers 'ки' to identify problem-solving steps and independently scores each mathematical derivation step

High-precision Judgment

Examples show significantly different confidence scores for correct and incorrect steps (e.g., 0.9983 vs. 0.0240)

Lightweight Fine-tuning

Targeted fine-tuning based on the efficient Mistral-7B model, maintaining the original model's advantages while adapting to specific tasks

Model Capabilities

Mathematical step correctness judgment

Multi-step problem decomposition evaluation

Numerical calculation verification

Logical reasoning verification

Use Cases

Educational Technology

Automatic Homework Grading

Automatically evaluates students' mathematical problem-solving processes, not just final answers

Identifies specific incorrect steps and provides targeted feedback

Intelligent Tutoring System

Real-time verification of problem-solving step correctness in online learning platforms

Helps students understand the root of errors and improve problem-solving methods

Academic Research

Mathematical Reasoning Research

Analyzes typical error patterns in large language models' mathematical reasoning

Provides data support for improving models' mathematical capabilities

🚀 Process Reward Model (mistral - 7b)

This is a process reward model (mistral - 7b) used in Math - Shepherd. It evaluates the step - by - step solutions of math problems and outputs logits, which can be post - processed to get scores for each step.

🚀 Quick Start

Input

The input consists of a question and step - by - step solutions with a special step tag ки. For example:

Janet’s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes .... ? Step 1: Janet's ducks lay 16 eggs per day. ки
Step 2: She eats three for breakfast every morning, so she has 16 - 3 = 13 eggs left. ки
Step 3: She bakes muffins for her friends every day with four eggs, so she has 13 - 4 = 9 eggs left. ки
Step 4: She sells the remainder at the farmers' market daily for $2 per fresh duck egg, so she makes 9 * $2 = $18 every day at the farmers' market. The answer is: 18 ки

Output

The output is the logits. You need to post - process it to achieve the score of each step.

💻 Usage Examples

Basic Usage

from transformers import AutoTokenizer
from transformers import AutoModelForCausalLM
import torch

good_token = '+'
bad_token = '-'
step_tag = 'ки'

tokenizer = AutoTokenizer.from_pretrained('peiyi9979/math-shepherd-mistral-7b-prm')
candidate_tokens = tokenizer.encode(f"{good_token} {bad_token}")[1:] # [648, 387]
step_tag_id = tokenizer.encode(f"{step_tag}")[-1] # 12902
model = AutoModelForCausalLM.from_pretrained('peiyi9979/math-shepherd-mistral-7b-prm').eval()

question = """Janet’s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?"""
output1 = """Step 1: Janet's ducks lay 16 eggs per day. ки
Step 2: She eats three for breakfast every morning, so she has 16 - 3 = 13 eggs left. ки
Step 3: She bakes muffins for her friends every day with four eggs, so she has 13 - 4 = 9 eggs left. ки
Step 4: She sells the remainder at the farmers' market daily for $2 per fresh duck egg, so she makes 9 * $2 = $18 every day at the farmers' market. The answer is: 18 ки""" # 18 is right
output2 = """Step 1: Janet's ducks lay 16 eggs per day. ки
Step 2: She eats three for breakfast every morning, so she has 16 - 3 = 13 eggs left. ки
Step 3: She bakes muffins for her friends every day with four eggs, so she has 13 - 4 = 9 eggs left. ки
Step 4: She sells the remainder at the farmers' market daily for $2 per fresh duck egg, so she makes 9 * $2 = $17 every day at the farmers' market. The answer is: 17 ки""" # 17 is wrong

for output in [output1, output2]:
    input_for_prm = f"{question} {output}"
    input_id = torch.tensor([tokenizer.encode(input_for_prm)])

    with torch.no_grad():
        logits = model(input_id).logits[:,:,candidate_tokens]
        scores = logits.softmax(dim=-1)[:,:,0] 
        step_scores = scores[input_id == step_tag_id]
        print(step_scores)
        
# tensor([0.9955, 0.9958, 0.9983, 0.9957])
# tensor([0.9955, 0.9958, 0.9983, 0.0240])

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご