distilgpt2-tiny-conversational Open-Source Dialogue Model - Free to Generate Stunning Dialogues for Characters Alpha/Beta

Distilgpt2 Tiny Conversational

Developed by ethzanalytics

A dialogue model fine-tuned on distilgpt2, specifically designed for ai-msgbot to generate dialogues between characters alpha/beta

Dialogue System

Transformers

Open Source License:Apache-2.0 #Lightweight dialogue generation #Role turn marking #Wikipedia knowledge enhancement

Downloads 319

Release Time : 3/2/2022

Model Overview

This is a basic dialogue model that can be used to build chatbots, supporting the generation of complete dialogues between two characters

Model Features

Lightweight dialogue model

A lightweight version based on distilgpt2, retaining dialogue generation capabilities while reducing model size

Role dialogue framework

Designed specifically for the alpha/beta character dialogue framework, supporting the generation of complete dialogues between two characters

Optimized inference parameters

Provides optimized inference parameter settings, such as temperature and repetition penalty coefficients, for better dialogue generation results

Model Capabilities

Text generation

Dialogue systems

Chatbots

Use Cases

Social dialogue

Daily chatting

Generate daily dialogues between two characters

Q&A interaction

Answer user questions and generate dialogues

Entertainment applications

Riddle solving

Generate dialogues involving riddles and their solutions

🚀 distilgpt2-tiny-conversational

This is a fine - tuned version of distilgpt2 on a parsed version of Wizard of Wikipedia. It uses the Persona alpha/beta framework and is designed for use with ai-msgbot. This model achieves a loss of 2.2461 on the evaluation set.

✨ Features

It is a basic dialogue model for conversation and can be used as a chatbot.
Check out a simple demo here

📦 Installation

No installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

You can use the provided widget examples to interact with the model. Here are some examples:

walk:

I know you're tired, but can we go for another walk this evening?
person beta:

activities:

Have you done anything exciting lately?
person beta:

grocery:

hey - do you have a favorite grocery store around here?
person beta:

dinner:

Can you take me for dinner somewhere nice this time?
person beta:

social media:

What's your favorite form of social media?
person beta:

greeting:

Hi, how are you?
person beta:

sister:

I am the best; my sister is the worst. What am I?
person beta:

alligator:

What do you call an alligator who's just had surgery to remove his left arm?
person beta:

dollar:

A man walks into a bar and asks for a drink. The bartender asks for $10, and he pays him $1. What did he pay him with?
person beta:

mailbox:

What did I say was in the mailbox when it was actually in the cabinet?
person beta:

language:

My friend says that she knows every language, but she doesn't speak any of them.. what's wrong with her?
person beta:

Inference Parameters

The following parameters are used for inference:

Parameter	Value
min_length	2
max_length	64
length_penalty	0.7
no_repeat_ngram_size	2
do_sample	True
top_p	0.95
top_k	20
temperature	0.3
repetition_penalty	3.5

📚 Documentation

Intended Uses & Limitations

It is designed for integrating with this repo: ai-msgbot
The model generates whole conversations between two entities, person alpha and person beta. These entity names are used functionally as custom <bos> tokens to extract when one response ends and another begins.

Training and Evaluation Data

The training data is wizard of Wikipedia parsed, from parlAI

Training Procedure

It uses deepspeed + huggingface trainer. An example notebook is in ai-msgbot

Training Hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e - 05
train_batch_size: 32
eval_batch_size: 32
seed: 42
distributed_type: multi - GPU
gradient_accumulation_steps: 4
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.05
num_epochs: 30

Training Results

Training Loss	Epoch	Step	Validation Loss
No log	1.0	418	2.7793
2.9952	2.0	836	2.6914
2.7684	3.0	1254	2.6348
2.685	4.0	1672	2.5938
2.6243	5.0	2090	2.5625
2.5816	6.0	2508	2.5332
2.5816	7.0	2926	2.5098
2.545	8.0	3344	2.4902
2.5083	9.0	3762	2.4707
2.4793	10.0	4180	2.4551
2.4531	11.0	4598	2.4395
2.4269	12.0	5016	2.4238
2.4269	13.0	5434	2.4102
2.4051	14.0	5852	2.3945
2.3777	15.0	6270	2.3848
2.3603	16.0	6688	2.3711
2.3394	17.0	7106	2.3613
2.3206	18.0	7524	2.3516
2.3206	19.0	7942	2.3398
2.3026	20.0	8360	2.3301
2.2823	21.0	8778	2.3203
2.2669	22.0	9196	2.3105
2.2493	23.0	9614	2.3027
2.2334	24.0	10032	2.2930
2.2334	25.0	10450	2.2852
2.2194	26.0	10868	2.2754
2.2014	27.0	11286	2.2695
2.1868	28.0	11704	2.2598
2.171	29.0	12122	2.2539
2.1597	30.0	12540	2.2461

Framework Versions

Transformers 4.16.1
Pytorch 1.10.0+cu111
Tokenizers 0.11.0

📄 License

This project is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご