Beaver 7b V1.0 Cost
The Beaver Cost Model is a preference model trained on the PKU-SafeRLHF dataset, designed to evaluate the safety of model outputs in safe RLHF algorithms.
Downloads 3,336
Release Time : 7/10/2023
Model Overview
This model plays a role in safe RLHF algorithms, helping the Beaver model become safer and more harmless, based on the Transformer architecture's autoregressive language model.
Model Features
Safe Reinforcement Learning
Designed specifically for safe RLHF algorithms to help models output safer and more harmless content
Based on LLaMA Architecture
Fine-tuned on LLaMA and Alpaca models, equipped with strong language understanding capabilities
Safety Preference Scoring
Capable of evaluating and scoring the safety of model outputs
Model Capabilities
Safety Preference Scoring
Dialogue Safety Evaluation
Reinforcement Learning Safety Feedback
Use Cases
AI Safety
Dialogue System Safety Evaluation
Evaluate the safety of dialogue system outputs to prevent harmful content generation
Enhance the safety and reliability of dialogue systems
RLHF Training
Provide safety preference signals during reinforcement learning human feedback training
Help train safer AI models
Featured Recommended AI Models