Llama 3 8B SFR SFT R
A supervised fine-tuned model based on LLaMA-3-8B, developed by Salesforce, for the supervised fine-tuning phase in reinforcement learning from human feedback (RLHF) workflows.
Downloads 22
Release Time : 5/10/2024
Model Overview
This model is the supervised fine-tuned version of Salesforce/SFR-Iterative-DPO-LLaMA-3-8B-R, primarily used for text generation tasks, optimized to support reinforcement learning from human feedback (RLHF) workflows.
Model Features
Supervised Fine-Tuning Optimization
Specially fine-tuned for reinforcement learning from human feedback (RLHF) workflows, enhancing model performance on specific tasks.
Iterative DPO Support
Supports iterative direct preference optimization (DPO), suitable for complex reinforcement learning from human feedback scenarios.
Multi-Stage Model Release
Provides comprehensive workflow support including supervised fine-tuned models, reward models, and reinforcement learning from human feedback models.
Model Capabilities
Text Generation
Reinforcement Learning from Human Feedback Support
Supervised Fine-Tuning Optimization
Use Cases
Academic Research
RLHF Research
Used to study the supervised fine-tuning phase in reinforcement learning from human feedback (RLHF) workflows.
Improves model performance on specific tasks.
Text Generation
High-Quality Text Generation
Generates high-quality text content suitable for various natural language processing tasks.
Produces fluent and coherent text.
Featured Recommended AI Models