RM - R1 - DeepSeek - Distilled - Qwen - 14B Open - source Reward Model - Judge candidate answers and provide interpretable judgments

RM R1 DeepSeek Distilled Qwen 14B

Developed by gaotang

RM-R1 is a training framework for reasoning reward models (ReasRM), which evaluates candidate answers by generating scoring criteria or reasoning traces, providing explainable judgments.

Large Language Model

Transformers

EnglishOpen Source License:MIT #Reasoning Reward Model #Explainable Scoring #Reinforcement Learning Optimization

Downloads 95

Release Time : 5/6/2025

Model Overview

This model adopts a two-stage training approach, first distilling high-quality reasoning traces, then optimizing with verifiable reward reinforcement learning, suitable for RLHF/RLAIF, automated evaluation, and research purposes.

Model Features

Reasoning Reward Modeling

Evaluates answers by generating scoring criteria or reasoning traces, providing a fully explainable judgment process.

Two-Stage Training

First distills high-quality reasoning traces, then optimizes with verifiable reward reinforcement learning.

High Performance

Achieves an absolute accuracy improvement of up to +13.8% on public reward model benchmarks.

Model Capabilities

Text Ranking

Generating Scoring Criteria

Generating Reasoning Traces

Preference Expression

Use Cases

Reinforcement Learning

RLHF/RLAIF

Used as a plug-and-play reward function for policy optimization.

Automated Evaluation

LLM Judge

Used for automated evaluation of open-domain QA, chat, and reasoning.

Research

Process Supervision Research

Used for studying chain-of-thought verification or scoring criteria generation.

Property	Details
Base Model	deepseek - ai/DeepSeek - R1 - Distill - Qwen - 14B
Language	en
License	mit
Pipeline Tag	text - ranking
Library Name	transformers

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

RM R1 DeepSeek Distilled Qwen 14B

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 RM-R1: Reasoning Reward Model Training Framework

🚀 Quick Start

✨ Features

Two - stage training

Backbones released

Intended uses

📄 License

📋 Information Table