Eleuther - Pythia 2.8B - HH - SFT Open - source Model: Fine - tuning a Causal Language Model for Intelligent Q&A

Home

Eleuther Pythia2.8b Hh Sft

Developed by lomahony

A causal language model based on Pythia-2.8b, fine-tuned using the Anthropic human preference dataset

Large Language Model

Transformers

EnglishOpen Source License:Apache-2.0 #Human preference alignment #RLHF fine-tuning #Dialogue optimization

Downloads 205

Release Time : 8/7/2023

Model Overview

This is a large language model fine-tuned with Reinforcement Learning from Human Feedback (RLHF), focusing on generating text content that aligns with human preferences

Model Features

Human preference alignment

RLHF fine-tuning makes the model's outputs more aligned with human values and preferences

Transparency and reproducibility

Complete training logs and evaluation methods are publicly available to ensure research reproducibility

Efficient training

Significant performance improvement achieved with just 1 training epoch

Model Capabilities

Text generation

Dialogue systems

Preference-aligned text generation

Open-domain QA

Use Cases

Dialogue systems

Intelligent assistant

Building dialogue assistants that align with human preferences

Generates safer and more helpful responses

Content generation

Safe text generation

Generating content that adheres to ethical standards

Reduces the production of harmful or biased content

Property	Details
Tags	pytorch, causal-lm, pythia
License	Apache-2.0
Datasets	Anthropic/hh-rlhf

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Eleuther Pythia2.8b Hh Sft

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 Pythia-2.8b Supervised Finetuning

🚀 Quick Start

📄 License

📦 Information Table