S

Skywork Reward Llama 3.1 8B V0.2

Developed by Skywork
An advanced reward model built on the Llama-3.1-8B-Instruct architecture, trained with 80K high-quality preference pairs, excelling in handling preference issues in complex scenarios.
Downloads 25.99k
Release Time : 10/14/2024

Model Overview

This model is a text classification model specifically designed for evaluating and rewarding dialogue response quality, applicable to multiple domains such as mathematics, programming, and safety.

Model Features

High-quality Data Training
Trained with carefully selected 80K high-quality preference pairs to ensure excellent model performance.
Multi-domain Coverage
Covers multiple domains including mathematics, programming, and safety, capable of handling preference issues in complex scenarios.
Purified Dataset
Uses the purified dataset version v0.2, avoiding contamination issues with RewardBench evaluation prompts.

Model Capabilities

Text Classification
Dialogue Response Quality Evaluation
Multi-domain Preference Judgment

Use Cases

Dialogue Systems
Dialogue Response Scoring
Evaluates the quality of responses generated in dialogue systems to select the optimal response.
Ranked first among 8B models on the RewardBench leaderboard.
Education
Math Problem Solution Evaluation
Evaluates the quality of students' solutions to math problems and provides feedback.
Can accurately distinguish between correct and incorrect math solutions.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase