AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Reasoning Reward Modeling

# Reasoning Reward Modeling

RM R1 DeepSeek Distilled Qwen 32B
MIT
RM-R1 is a training framework for reasoning reward models (ReasRM), which evaluates candidate answers by generating scoring criteria or reasoning trajectories, providing interpretable evaluations.
Large Language Model Transformers English
R
gaotang
506
0
RM R1 Qwen2.5 Instruct 7B
MIT
RM-R1 is a training framework for reasoning reward models (ReasRM), which evaluates candidate answers by generating scoring criteria or reasoning traces, significantly improving accuracy and interpretability compared to traditional reward models.
Large Language Model Transformers English
R
gaotang
23
2
RM R1 Qwen2.5 Instruct 14B
MIT
RM-R1 is a training framework for reasoning reward models (ReasRM), which evaluates candidate answers by generating scoring criteria or reasoning traces, providing explainable assessments.
Large Language Model Transformers English
R
gaotang
21
1
RM R1 Qwen2.5 Instruct 32B
MIT
RM-R1 is a framework for reward modeling through reasoning trajectory generation, offering significant improvements in accuracy and interpretability compared to traditional methods
Large Language Model Transformers English
R
gaotang
29
1
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase