RM R1 Qwen2.5 Instruct 14B
RM-R1 is a training framework for reasoning reward models (ReasRM), which evaluates candidate answers by generating scoring criteria or reasoning traces, providing explainable assessments.
Downloads 21
Release Time : 5/6/2025
Model Overview
RM-R1 is an innovative reward model framework that enhances accuracy and interpretability through two-phase training (distilling high-quality reasoning traces and reinforcement learning with verifiable rewards).
Model Features
Reasoning Reward Model
Evaluates candidate answers by generating scoring criteria or reasoning traces, offering fully explainable assessments.
Two-Phase Training
1. Distills ~8.7K high-quality reasoning traces; 2. Uses reinforcement learning with verifiable rewards (RLVR) on ~64K preference pairs.
High Performance
Achieves up to +13.8% absolute accuracy improvement on public reward model benchmarks.
Model Capabilities
Text Ranking
Reasoning Trace Generation
Scoring Criteria Generation
Preference Judgment
Use Cases
Reinforcement Learning
RLHF/RLAIF
Serves as a plug-and-play reward function for policy optimization
Automatic Evaluation
LLM Judge
Used for automated evaluation of open-domain QA, chat, and reasoning tasks
Research
Process Supervision Research
Investigates chain-of-thought verification or scoring criteria generation
Featured Recommended AI Models