Model Overview
Model Features
Model Capabilities
Use Cases
đ gpt2-conversational-or-qa (prototype)
This model is designed to generate conversational responses. It uses the GPT - 2 architecture and is trained on conversational data, aiming to provide natural and relevant answers.
đ Quick Start
Prerequisites
- Ensure you have PyTorch and Transformers frameworks installed.
- A GPU with at least 4GB of VRAM is recommended for optimal performance.
Deployment
import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')
tokenizer.add_special_tokens({'pad_token': '[PAD]'})
tokenizer.add_special_tokens({'eos_token': '<|End|>'})
special_tokens = {
"additional_special_tokens": ["<|USER|>", "<|SYSTEM|>", "<|ASSISTANT|>"]
}
tokenizer.add_special_tokens(special_tokens)
model.resize_token_embeddings(len(tokenizer))
model.load_state_dict(torch.load("path/to/model"))
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
def generate_text(model, tokenizer, prompt, max_length=1024):
prompt = f'<|USER|> {prompt} <|ASSISTANT|> '
input_ids = tokenizer.encode(prompt, add_special_tokens=True, return_tensors="pt").to(device)
attention_mask = torch.ones_like(input_ids).to(device)
output = model.generate(input_ids,
max_length=max_length,
do_sample=True,
top_k=35,
top_p=0.80,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id,
attention_mask=attention_mask)
output_ids = tokenizer.decode(output[0], skip_special_tokens=False)
assistant_token_index = output_ids.index('<|ASSISTANT|>') + len('<|ASSISTANT|>')
next_token_index = output_ids.find('<|', assistant_token_index)
output_ids = output_ids[assistant_token_index:next_token_index]
return output_ids
# Loop to interact with the model
while True:
prompt = input("Enter a prompt (or 'q' to quit): ")
if prompt == "q":
break
output_text = generate_text(model, tokenizer, prompt)
print(output_text)
⨠Features
- Conversational Response Generation: Capable of generating natural and engaging conversational responses.
- GPT - 2 Architecture: Utilizes the state - of - the - art GPT - 2 architecture for high - quality text generation.
- Multi - metric Evaluation: Evaluated based on multiple metrics such as BLEU score, perplexity, loss, reward, and penalty.
đĻ Installation
No specific installation steps are provided in the original document.
đģ Usage Examples
Basic Usage
The basic usage is shown in the deployment code above. You can input a prompt and get a conversational response from the model.
Advanced Usage
The model can be fine - tuned on your own conversational data. The input format should be <|USER|> {user prompt} <|ASSISTANT|>
and the target/label should be <|USER|> {user prompt} <|ASSISTANT|> {dataset output} <|End|>
đ Documentation
Model Details
Property | Details |
---|---|
Model Name | gpt2 - conversational - or - qa (prototype) |
Model Type | Language Modeling |
Task | Generating Conversational Responses |
Hardware | 1x RTX 3060 |
Description | This model is trained on a dataset of conversations between a user and an AI assistant. It uses the GPT - 2 architecture and is fine - tuned on conversational data using maximum likelihood estimation. It is evaluated based on its ability to generate grammatically correct and semantically relevant responses. Larger models like https://huggingface.co/Locutusque/gpt2 - medium - conversational and https://huggingface.co/Locutusque/gpt2 - large - conversational have also been trained. |
Intended Use
This model is intended for generating conversational responses in various contexts like chatbots, virtual assistants, and customer service applications. It can be used in both text - based and voice - based interfaces and can be integrated into existing applications using PyTorch and Transformers frameworks.
Training Data
The model is trained on a large conversational dataset. The data is pre - processed to remove sensitive information and split into a training set and a validation set. The model was trained on 245,000 examples over 1,225,000 steps and achieved decent metrics. A side - by - side comparison of the base GPT - 2 and this conversational GPT - 2 during the first steps of training is as follows:
# Base GPT-2
"""
Epoch 1/5, Batch 1/10000: Loss - 64.9255, Reward - 260.0000, Penalty - 624.0000, BLEU - 0.0000
Epoch 1/5, Batch 2/10000: Loss - 57.4635, Reward - 303.0000, Penalty - 870.0000, BLEU - 0.0000
Epoch 1/5, Batch 3/10000: Loss - 67.8061, Reward - 295.0000, Penalty - 908.0000, BLEU - 0.0000
Epoch 1/5, Batch 4/10000: Loss - 59.6118, Reward - 800.0000, Penalty - 740.0000, BLEU - 0.0000
Epoch 1/5, Batch 5/10000: Loss - 67.4855, Reward - 402.0000, Penalty - 806.0000, BLEU - 0.0000
Epoch 1/5, Batch 6/10000: Loss - 29.3718, Reward - 937.0000, Penalty - 760.0000, BLEU - 0.0000
Epoch 1/5, Batch 7/10000: Loss - 79.0709, Reward - 390.0000, Penalty - 1114.0000, BLEU - 0.0000
Epoch 1/5, Batch 8/10000: Loss - 61.4583, Reward - 385.0000, Penalty - 760.0000, BLEU - 0.0000
Epoch 1/5, Batch 9/10000: Loss - 56.3084, Reward - 741.0000, Penalty - 560.0000, BLEU - 3.5500
Epoch 1/5, Batch 10/10000: Loss - 80.0192, Reward - 838.0000, Penalty - 1424.0000, BLEU - 0.0000
Epoch 1/5, Batch 11/10000: Loss - 51.8236, Reward - 228.0000, Penalty - 812.0000, BLEU - 0.0001
Epoch 1/5, Batch 12/10000: Loss - 71.4071, Reward - 541.0000, Penalty - 982.0000, BLEU - 0.0000
Epoch 1/5, Batch 13/10000: Loss - 33.3624, Reward - 910.0000, Penalty - 1002.0000, BLEU - 0.0027
Epoch 1/5, Batch 14/10000: Loss - 55.9721, Reward - 808.0000, Penalty - 798.0000, BLEU - 0.0005
Epoch 1/5, Batch 15/10000: Loss - 67.0336, Reward - 517.0000, Penalty - 764.0000, BLEU - 0.0000
"""
# Conversational GPT-2
"""
Epoch 1/5, Batch 1/10000: Loss - 6.1980, Reward - 887.0000, Penalty - 1500.0000, BLEU - 0.0648
Epoch 1/5, Batch 2/10000: Loss - 4.5750, Reward - 245.0000, Penalty - 1618.0000, BLEU - 0.0008
Epoch 1/5, Batch 3/10000: Loss - 5.1264, Reward - 600.0000, Penalty - 642.0000, BLEU - 5.7981
Epoch 1/5, Batch 4/10000: Loss - 0.2995, Reward - 1020.0000, Penalty - 74.0000, BLEU - 13.8469
Epoch 1/5, Batch 5/10000: Loss - 7.9377, Reward - 203.0000, Penalty - 1700.0000, BLEU - 0.3218
Epoch 1/5, Batch 6/10000: Loss - 5.0522, Reward - 1020.0000, Penalty - 2034.0000, BLEU - 0.1946
Epoch 1/5, Batch 7/10000: Loss - 2.0585, Reward - 925.0000, Penalty - 526.0000, BLEU - 16.1298
Epoch 1/5, Batch 8/10000: Loss - 5.9736, Reward - 1009.0000, Penalty - 1844.0000, BLEU - 0.0085
Epoch 1/5, Batch 9/10000: Loss - 6.0867, Reward - 245.0000, Penalty - 1690.0000, BLEU - 1.9342
Epoch 1/5, Batch 10/10000: Loss - 7.8497, Reward - 155.0000, Penalty - 1780.0000, BLEU - 0.0115
Epoch 1/5, Batch 11/10000: Loss - 3.8887, Reward - 1012.0000, Penalty - 2010.0000, BLEU - 0.6957
Epoch 1/5, Batch 12/10000: Loss - 6.6133, Reward - 216.0000, Penalty - 1638.0000, BLEU - 1.7853
Epoch 1/5, Batch 13/10000: Loss - 1.3319, Reward - 945.0000, Penalty - 374.0000, BLEU - 0.0075
Epoch 1/5, Batch 14/10000: Loss - 2.6296, Reward - 956.0000, Penalty - 414.0000, BLEU - 3.2207
Epoch 1/5, Batch 15/10000: Loss - 6.8827, Reward - 1013.0000, Penalty - 1970.0000, BLEU - 3.7418
"""
Model Architecture
The model uses the GPT - 2 architecture, which is a multi - layered decoder - only transformer with self - attention mechanisms. This allows the model to capture long - term dependencies and generate coherent text.
Evaluation Metrics
The model is evaluated based on several metrics:
Metric | Value |
---|---|
BLEU Score | 9 |
Perplexity | 19 |
Loss | 1.7 |
Limitations and Bias
â ī¸ Important Note
This model is not suitable for all use cases due to its limited training time on a weak computer. It may produce irrelevant or nonsensical responses. It has not been fine - tuned to remember the chat history, is unable to provide follow - up responses, and does not know the answer to many questions. For optimal performance, use a GPU with at least 4GB of VRAM and download the model manually instead of using the Transformers library or deploying it on the Interface API.
đĄ Usage Tip
When using the model, ensure the input format is correct as described in the advanced usage section.
đ§ Technical Details
The model is fine - tuned on conversational data using maximum likelihood estimation. The GPT - 2 architecture's self - attention mechanisms help the model capture long - term dependencies in the text. During training, the loss metric is calculated to measure the difference between the predicted output and the actual output.
đ License
The model is under the OpenRail license.
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 25.09 |
ARC (25 - shot) | 21.42 |
HellaSwag (10 - shot) | 27.61 |
MMLU (5 - shot) | 26.51 |
TruthfulQA (0 - shot) | 47.31 |
Winogrande (5 - shot) | 51.14 |
GSM8K (5 - shot) | 0.08 |
DROP (3 - shot) | 1.55 |
Note: This model is deprecated. Please see https://huggingface.co/Locutusque/gpt2 - conversational - retrain for a better - performing model.

