R

RM Gemma 2B

Developed by weqweasdas
A reward model trained on google/gemma-2b-it for evaluating text generation quality
Downloads 2,618
Release Time : 2/25/2024

Model Overview

This reward model is trained based on the Gemma-2B foundation model, specifically designed for evaluating and ranking the quality of different text generation results, suitable for Reinforcement Learning from Human Feedback (RLHF) scenarios.

Model Features

Multi-source dataset training
Incorporates 6 high-quality datasets including HH-RLHF, SHP, UltraFeedback, totaling 250,000 sets of comparison data
Rigorous data cleaning
Employs multiple strategies to ensure the quality of comparison data, such as retaining samples with significant differences and removing equal-score samples
Efficient training configuration
Utilizes optimized training settings including a learning rate of 1e-5, batch size of 256, and cosine learning rate decay

Model Capabilities

Text quality scoring
Generation result ranking
Dialogue response evaluation
Reinforcement learning feedback

Use Cases

Reinforcement learning
Rejection sampling fine-tuning
Used in the rejection sampling phase of RLHF workflows to filter high-quality generation results
Can be directly applied to RAFT (Reward rAnked FineTuning) algorithm
Dialogue systems
Chatbot response evaluation
Evaluates the quality of different chatbot responses to select the best reply
Performs well on benchmarks like MT Bench
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase