S

Skywork Reward Gemma 2 27B V0.2

Developed by Skywork
A high-performance reward model built on the Gemma-2-27B architecture, trained using the purified Skywork-Reward-Preference-80K-v0.2 dataset, excelling in preference judgment in complex scenarios.
Downloads 9,496
Release Time : 10/14/2024

Model Overview

This is an advanced reward model specifically designed for evaluating and judging text response quality, demonstrating excellent performance across multiple domains including mathematics, coding, and safety.

Model Features

High-quality Dataset
Trained using the purified Skywork-Reward-Preference-80K-v0.2 dataset with contaminated sample pairs removed
Multi-domain Capability
Excels in preference judgment across multiple domains including mathematics, coding, and safety
High Performance
Ranked first on the RewardBench leaderboard with a total score of 94.3
Optimized Training Strategy
Employs special data selection and scoring strategies to optimize model performance

Model Capabilities

Text Preference Scoring
Multi-domain Judgment
Complex Scenario Evaluation

Use Cases

AI Training
Reinforcement Learning Reward Model
Serves as a reward signal provider in reinforcement learning
Improves AI model training efficiency
Content Evaluation
Response Quality Scoring
Evaluates the quality of AI-generated responses
Accurately distinguishes between high-quality and low-quality responses
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase