L

Llama 3 8B SFR SFT R

Developed by Salesforce
A supervised fine-tuned model based on LLaMA-3-8B, developed by Salesforce, for the supervised fine-tuning phase in reinforcement learning from human feedback (RLHF) workflows.
Downloads 22
Release Time : 5/10/2024

Model Overview

This model is the supervised fine-tuned version of Salesforce/SFR-Iterative-DPO-LLaMA-3-8B-R, primarily used for text generation tasks, optimized to support reinforcement learning from human feedback (RLHF) workflows.

Model Features

Supervised Fine-Tuning Optimization
Specially fine-tuned for reinforcement learning from human feedback (RLHF) workflows, enhancing model performance on specific tasks.
Iterative DPO Support
Supports iterative direct preference optimization (DPO), suitable for complex reinforcement learning from human feedback scenarios.
Multi-Stage Model Release
Provides comprehensive workflow support including supervised fine-tuned models, reward models, and reinforcement learning from human feedback models.

Model Capabilities

Text Generation
Reinforcement Learning from Human Feedback Support
Supervised Fine-Tuning Optimization

Use Cases

Academic Research
RLHF Research
Used to study the supervised fine-tuning phase in reinforcement learning from human feedback (RLHF) workflows.
Improves model performance on specific tasks.
Text Generation
High-Quality Text Generation
Generates high-quality text content suitable for various natural language processing tasks.
Produces fluent and coherent text.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase