RM - R1 - Qwen2.5 - Instruct - 7B Open Source Model: Improving Evaluation Accuracy and Interpretability of Candidate Answers

RM R1 Qwen2.5 Instruct 7B

Developed by gaotang

RM-R1 is a training framework for reasoning reward models (ReasRM), which evaluates candidate answers by generating scoring criteria or reasoning traces, significantly improving accuracy and interpretability compared to traditional reward models.

Large Language Model

Transformers

EnglishOpen Source License:MIT #Reasoning Reward Modeling #Interpretability Scoring #RLHF Optimization

Downloads 23

Release Time : 5/6/2025

Model Overview

RM-R1 is an innovative reward model training framework that adopts a two-stage training approach: first distilling high-quality reasoning traces, then implementing verifiable reward reinforcement learning. The model can generate interpretable scoring criteria, significantly enhancing the accuracy of preference judgments.

Model Features

Reasoning Reward Model

Evaluates candidate answers by generating scoring criteria or reasoning traces, offering higher accuracy and interpretability compared to traditional scalar reward models.

Two-Stage Training

First stage distills high-quality reasoning traces (~8.7K entries), followed by the second stage implementing verifiable reward reinforcement learning (RLVR) on ~64K preference pairs.

Performance Improvement

Achieves an absolute accuracy improvement of up to 13.8% on public reward model benchmarks.

Model Capabilities

Preference Judgment

Scoring Criteria Generation

Reasoning Trace Generation

Text Quality Evaluation

Use Cases

Reinforcement Learning

RLHF/RLAIF

Serves as a plug-and-play policy optimization reward function.

Provides more accurate and interpretable reward signals.

Automated Evaluation

LLM Referee

Evaluates response quality in open-domain QA, dialogue, and reasoning tasks.

Provides interpretable scoring rationale.

Research

Process Supervision Research

Explores chain-of-thought verification or scoring criteria generation.

Property	Details
Base Model	Qwen/Qwen2.5-7B-Instruct
Language	en
License	mit
Pipeline Tag	text-ranking
Library Name	transformers

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

RM R1 Qwen2.5 Instruct 7B

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 RM-R1: Reasoning Reward Model Training Framework

🚀 Quick Start

Model Information

✨ Features

Two - stage training

Backbones released

📚 Documentation

Intended uses

📄 License