G

GRM Gemma2 2B Rewardmodel Ft

Developed by Ray2333
A high-performance 2B parameter reward model fine-tuned based on GRM-Gemma2-2B-sftreg, which performs excellently in reward benchmark tests and surpasses multiple 8B models
Downloads 1,187
Release Time : 10/23/2024

Model Overview

This model is a 2B parameter reward model based on the Gemma2 architecture, specifically designed to evaluate and score the quality of text generation, and performs excellently in multiple dimensions such as dialogue, safety, and reasoning

Model Features

High performance
Achieved a score of 88.4 in the reward benchmark test, surpassing multiple 8B reward models and GPT4/Gemini
Advantages of small models
As a 2B parameter model, it achieves SOTA performance among models smaller than 3B
Extensive evaluation dimensions
Performs evenly in multiple dimensions such as dialogue, difficult dialogue, safety, and reasoning

Model Capabilities

Text quality assessment
Dialogue scoring
Safe content recognition
Reasoning ability assessment

Use Cases

Reinforcement learning training
RLHF training
As a reward model in reinforcement learning, it guides the optimization of language models
It can help train language models that better meet human preferences
Content assessment
Dialogue quality scoring
Evaluates the quality of chatbot responses
Scored 93.0 in the dialogue dimension, better than GPT4/Gemini
Safe content recognition
Identifies potentially unsafe or inappropriate text content
Scored 92.2 in the safety dimension, performing excellently
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase