T

Thinkprm 1.5B

Developed by launch
ThinkPRM-1.5B is a generative process reward model based on the R1-Distill-Qwen-1.5B architecture, capable of step-by-step verification of reasoning processes by generating verification chains of thought.
Downloads 68
Release Time : 4/25/2025

Model Overview

This model is specifically designed to verify the correctness of step-by-step reasoning processes. It can generate explicit verification chains of thought and annotate the correctness of each step, offering high data efficiency and robust performance.

Model Features

High Data Efficiency
Significantly reduces the amount of supervised data required compared to traditional discriminative PRMs, needing only 1,000 synthetically generated verification chain-of-thought datasets for fine-tuning.
Generative Verification
Provides step-level verification scores by generating natural language critiques and correctness judgments, offering interpretability.
Multi-domain Applicability
Evaluated in mathematical reasoning, scientific QA, and code generation domains, outperforming baseline models.

Model Capabilities

Generate verification chains of thought
Step-level correctness judgment
Solution scoring
Independent verification of problem-solution pairs

Use Cases

Mathematical Reasoning
Mathematical Problem-solving Step Verification
Verify the correctness of mathematical problem-solving steps, such as solving equations, proofs, etc.
Excellent performance on benchmarks like MATH-500 and AIME '24.
Code Generation
Code Verification
Verify the logic of generated code for correctness.
Excellent performance on the LiveCodeBench benchmark.
Scientific QA
Scientific Problem Answer Verification
Verify the correctness of steps in scientific problem answers.
Excellent performance on the GPQA-Diamond benchmark.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase