🚀 distilgpt2-tiny-conversational
This is a fine - tuned version of distilgpt2 on a parsed version of Wizard of Wikipedia. It uses the Persona alpha/beta framework and is designed for use with ai-msgbot. This model achieves a loss of 2.2461 on the evaluation set.
✨ Features
- It is a basic dialogue model for conversation and can be used as a chatbot.
- Check out a simple demo here
📦 Installation
No installation steps are provided in the original document, so this section is skipped.
💻 Usage Examples
Basic Usage
You can use the provided widget examples to interact with the model. Here are some examples:
I know you're tired, but can we go for another walk this evening?
person beta:
Have you done anything exciting lately?
person beta:
hey - do you have a favorite grocery store around here?
person beta:
Can you take me for dinner somewhere nice this time?
person beta:
What's your favorite form of social media?
person beta:
Hi, how are you?
person beta:
I am the best; my sister is the worst. What am I?
person beta:
What do you call an alligator who's just had surgery to remove his left arm?
person beta:
A man walks into a bar and asks for a drink. The bartender asks for $10, and he pays him $1. What did he pay him with?
person beta:
What did I say was in the mailbox when it was actually in the cabinet?
person beta:
My friend says that she knows every language, but she doesn't speak any of them.. what's wrong with her?
person beta:
Inference Parameters
The following parameters are used for inference:
Parameter |
Value |
min_length |
2 |
max_length |
64 |
length_penalty |
0.7 |
no_repeat_ngram_size |
2 |
do_sample |
True |
top_p |
0.95 |
top_k |
20 |
temperature |
0.3 |
repetition_penalty |
3.5 |
📚 Documentation
Intended Uses & Limitations
- It is designed for integrating with this repo: ai-msgbot
- The model generates whole conversations between two entities,
person alpha
and person beta
. These entity names are used functionally as custom <bos>
tokens to extract when one response ends and another begins.
Training and Evaluation Data
Training Procedure
- It uses deepspeed + huggingface trainer. An example notebook is in ai-msgbot
Training Hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e - 05
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- distributed_type: multi - GPU
- gradient_accumulation_steps: 4
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 30
Training Results
Training Loss |
Epoch |
Step |
Validation Loss |
No log |
1.0 |
418 |
2.7793 |
2.9952 |
2.0 |
836 |
2.6914 |
2.7684 |
3.0 |
1254 |
2.6348 |
2.685 |
4.0 |
1672 |
2.5938 |
2.6243 |
5.0 |
2090 |
2.5625 |
2.5816 |
6.0 |
2508 |
2.5332 |
2.5816 |
7.0 |
2926 |
2.5098 |
2.545 |
8.0 |
3344 |
2.4902 |
2.5083 |
9.0 |
3762 |
2.4707 |
2.4793 |
10.0 |
4180 |
2.4551 |
2.4531 |
11.0 |
4598 |
2.4395 |
2.4269 |
12.0 |
5016 |
2.4238 |
2.4269 |
13.0 |
5434 |
2.4102 |
2.4051 |
14.0 |
5852 |
2.3945 |
2.3777 |
15.0 |
6270 |
2.3848 |
2.3603 |
16.0 |
6688 |
2.3711 |
2.3394 |
17.0 |
7106 |
2.3613 |
2.3206 |
18.0 |
7524 |
2.3516 |
2.3206 |
19.0 |
7942 |
2.3398 |
2.3026 |
20.0 |
8360 |
2.3301 |
2.2823 |
21.0 |
8778 |
2.3203 |
2.2669 |
22.0 |
9196 |
2.3105 |
2.2493 |
23.0 |
9614 |
2.3027 |
2.2334 |
24.0 |
10032 |
2.2930 |
2.2334 |
25.0 |
10450 |
2.2852 |
2.2194 |
26.0 |
10868 |
2.2754 |
2.2014 |
27.0 |
11286 |
2.2695 |
2.1868 |
28.0 |
11704 |
2.2598 |
2.171 |
29.0 |
12122 |
2.2539 |
2.1597 |
30.0 |
12540 |
2.2461 |
Framework Versions
- Transformers 4.16.1
- Pytorch 1.10.0+cu111
- Tokenizers 0.11.0
📄 License
This project is licensed under the Apache - 2.0 license.