R

RM R1 Qwen2.5 Instruct 14B

Developed by gaotang
RM-R1 is a training framework for reasoning reward models (ReasRM), which evaluates candidate answers by generating scoring criteria or reasoning traces, providing explainable assessments.
Downloads 21
Release Time : 5/6/2025

Model Overview

RM-R1 is an innovative reward model framework that enhances accuracy and interpretability through two-phase training (distilling high-quality reasoning traces and reinforcement learning with verifiable rewards).

Model Features

Reasoning Reward Model
Evaluates candidate answers by generating scoring criteria or reasoning traces, offering fully explainable assessments.
Two-Phase Training
1. Distills ~8.7K high-quality reasoning traces; 2. Uses reinforcement learning with verifiable rewards (RLVR) on ~64K preference pairs.
High Performance
Achieves up to +13.8% absolute accuracy improvement on public reward model benchmarks.

Model Capabilities

Text Ranking
Reasoning Trace Generation
Scoring Criteria Generation
Preference Judgment

Use Cases

Reinforcement Learning
RLHF/RLAIF
Serves as a plug-and-play reward function for policy optimization
Automatic Evaluation
LLM Judge
Used for automated evaluation of open-domain QA, chat, and reasoning tasks
Research
Process Supervision Research
Investigates chain-of-thought verification or scoring criteria generation
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase