RM-R1-Qwen2.5-Instruct-32B Open-source Reward Modeling Framework - Inference Trajectory Generation, Double Improvement in Accuracy and Interpretability

RM R1 Qwen2.5 Instruct 32B

Developed by gaotang

RM-R1 is a framework for reward modeling through reasoning trajectory generation, offering significant improvements in accuracy and interpretability compared to traditional methods

Large Language Model

Transformers

EnglishOpen Source License:MIT #Reasoning Reward Modeling #Interpretable Evaluation #RLHF Optimization

Downloads 29

Release Time : 5/6/2025

Model Overview

This model achieves interpretable reward scoring through two-stage training (reasoning trajectory distillation and reinforcement learning), suitable for RLHF/RLAIF and automated evaluation scenarios

Model Features

Interpretable Scoring

Provides fully transparent evaluation by generating scoring criteria or reasoning trajectories before expressing preferences

Two-Stage Training Framework

First distills 8.7K high-quality reasoning trajectories, then processes 64K preference pairs via RLVR

Performance Breakthrough

Achieves +13.8% absolute accuracy improvement on public benchmarks

Multi-Size Options

Offers 7B/14B/32B parameter versions and DeepSeek distilled checkpoints

Model Capabilities

Generating scoring criteria

Preference judgment

Reasoning trajectory generation

Open-domain QA evaluation

Dialogue quality scoring

Use Cases

Reinforcement Learning

RLHF/RLAIF

Serves as a plug-and-play reward function for policy optimization

Automated Evaluation

LLM Judge

Performs automatic scoring for open-domain QA, chat, and reasoning tasks

Research Tool

Process Supervision Research

Used for studying chain-of-thought verification or scoring criteria generation mechanisms

Property	Details
Base Model	Qwen/Qwen2.5-32B-Instruct
Language	en
License	mit
Pipeline Tag	text-ranking
Library Name	transformers

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

RM R1 Qwen2.5 Instruct 32B

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 RM-R1: Reasoning Reward Model Training Framework

🚀 Quick Start

TL;DR

✨ Features

📄 License

📦 Information