G

GRM Llama3.2 3B Rewardmodel Ft

Developed by Ray2333
A 3B-parameter reward model based on the Llama3 architecture, achieving a score of 90.9 in the reward-bench evaluation, outperforming multiple 8B reward models
Downloads 3,464
Release Time : 10/23/2024

Model Overview

This reward model is fine-tuned from the GRM-llama3.2-3B-sftreg model using the Skywork preference dataset v0.2, achieving state-of-the-art performance for 3B reward models

Model Features

High-Performance 3B Reward Model
Outperforms multiple 8B reward models at the 3B parameter scale, with a reward-bench evaluation score of 90.9
Trained on High-Quality Dataset
Fine-tuned using the cleaned Skywork preference dataset v0.2
Versatile Evaluation Capabilities
Excels in multiple dimensions including dialogue, challenging conversations, safety, and reasoning ability

Model Capabilities

Text Preference Scoring
Dialogue Quality Evaluation
Safe Content Identification
Reasoning Ability Assessment

Use Cases

Reinforcement Learning
RLHF Training
Serves as a reward signal provider in reinforcement learning
Helps train AI models that better align with human preferences
Content Evaluation
Dialogue Quality Scoring
Evaluates the response quality of AI assistants
Identifies high-quality and low-quality responses
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase