GPT2 Open Instruct V1 Anthropic HH RLHF Open-Source Dialogue Model - Free Deployment, Precise Response to Dialogue Prompts

Gpt2 Open Instruct V1 Anthropic Hh Rlhf

Developed by jtatman

A dialogue model fine-tuned on the Anthropic/hh-rlhf dataset based on GPT2-open-instruct, excelling in responding to prompts in dialogue scenarios

Large Language Model

Transformers

EnglishOpen Source License:MIT #Dialogue Fine-tuning #RLHF Optimization #Short-text Response

Downloads 125

Release Time : 7/22/2023

Model Overview

This model is a fine-tuned version of vicgalle/gpt2-open-instruct-v1 on a subset of the Anthropic/hh-rlhf dataset, primarily used for instruction response in dialogue scenarios

Model Features

Dialogue Scenario Optimization

Specifically optimized for the 'Human:' and 'Assistant:' dialogue format

Short-text Generation Advantage

Performs better in short-text reply scenarios

RLHF Adaptation

Reconstructed the language model head through partial RLHF adapters

Model Capabilities

Dialogue Generation

Instruction Response

Short-text Generation

Use Cases

Dialogue System

Dialogue Response Generation

Generate dialogue responses based on user input

Achieved a loss value of 2.1534 on the evaluation set

🚀 hh-rlhf

This model is a fine - tuned version of [vicgalle/gpt2 - open - instruct - v1](https://huggingface.co/vicgalle/gpt2 - open - instruct - v1), designed to study the intersection of instruct models and prompting.

🚀 Quick Start

This model is a fine - tuned version of [vicgalle/gpt2 - open - instruct - v1](https://huggingface.co/vicgalle/gpt2 - open - instruct - v1) on a subset (15k) of the Anthropic/hh - rlhf dataset. It achieves a loss of 2.1534 on the evaluation set.

This model responds well to the 'Human:' or 'Assistant:' prompt in conversation situations. Shorter responses are more suitable, and it's advisable to keep the generation length within a reasonable range. Otherwise, it may generate some rather esoteric responses, including fairly uncensored remarks and at times violent outbursts, especially when answering questions. It needs vetting for other textual uses.

Human: Insane clown posse says...

Human: Should we look for a woman?

Assistant: It’s okay if you’re having a tough time finding what you are looking for. It’s a common question people might come up with for an argument or misunderstanding. What are you looking for, and what kind of woman would you have?

Human: Are you trying to find someone to argue

✨ Features

Fine - Tuned Model: Based on [vicgalle/gpt2 - open - instruct - v1](https://huggingface.co/vicgalle/gpt2 - open - instruct - v1), fine - tuned on a subset of the Anthropic/hh - rlhf dataset.
Conversation - Oriented: Responds well to 'Human:' and 'Assistant:' prompts in conversations.

📚 Documentation

Model description

GPT2 open instruct was trained on the open - instruct dataset fully. The reimagines one LM head as a partial rhlf adapter, with subtle reinforcements.

Intended uses & limitations

Intended to study the intersection of instruct models and prompting that focuses on subtle exchanges of prompting. This probably needs to be refined substantially at this point.

Training and evaluation data

Train dataset size: 15000
Test dataset size: 500
Dataset({
    features: ['chosen', 'rejected'],
    num_rows: 15000
})
Dataset({
    features: ['chosen', 'rejected'],
    num_rows: 500
})

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0005
train_batch_size: 2
eval_batch_size: 1
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 100
num_epochs: 4

Training results

Training Loss	Epoch	Step	Validation Loss
2.3108	1.0	7500	2.1799
2.265	2.0	15000	2.1632
2.2507	3.0	22500	2.1567
2.2519	4.0	30000	2.1534

Framework versions

Transformers 4.31.0
Pytorch 2.0.1+cu118
Datasets 2.13.1
Tokenizers 0.13.3

📄 License

This model is released under the MIT license.

📦 Information Table

Property	Details
Model Type	Fine - tuned version of vicgalle/gpt2 - open - instruct - v1
Training Data	Anthropic/hh - rlhf (15k subset), hakurei/open - instruct - v1
Tokenizers	GPT2Tokenizer
Library Name	transformers
Metrics	bleu

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご