D

Decision Tree Reward Gemma 2 27B

Developed by RLHFlow
A decision tree reward model fine-tuned based on Gemma-2-27B, used to evaluate the quality of content generated by language models, with outstanding performance on the RewardBench leaderboard.
Downloads 18
Release Time : 1/22/2025

Model Overview

This model interprets language model preferences through decision tree methods, capable of assessing dimensions such as helpfulness, correctness, coherence, etc., suitable for Reinforcement Learning from Human Feedback (RLHF) scenarios.

Model Features

Decision Tree Architecture
Uses decision tree methods to analyze language model outputs, enabling more detailed evaluation across multiple quality dimensions compared to traditional sequence classifiers.
Multi-dimensional Evaluation
Can simultaneously evaluate five key dimensions: helpfulness, correctness, coherence, complexity, and detail.
High Performance
Achieved a comprehensive score of 95.4 on the RewardBench leaderboard, with particularly outstanding performance in challenging dialogues (91.4) and reasoning capabilities (99.2).

Model Capabilities

Text Quality Evaluation
Multi-dimensional Scoring
Response Comparison
Reinforcement Learning Feedback

Use Cases

Language Model Training
RLHF Training
Used as a reward model in Reinforcement Learning from Human Feedback training processes.
Provides more accurate preference signals, improving the quality of language model outputs.
Content Evaluation
Automatic Scoring
Evaluates the quality of content generated by language models.
Provides multi-dimensional scoring to help identify the best responses.
Featured Recommended AI Models
ยฉ 2025AIbase