đ A DialogGPT Model Trained on French OpenSubtitles with Custom Tokenizer
This is a DialogGPT model trained on French OpenSubtitles using a custom tokenizer. It offers a great solution for building French - based conversational AI applications.
Tags
Widget Examples
- "bonjour."
- "mais encore"
- "est ce que l'argent achete le bonheur?"
This model was trained using this notebook:
Training Notebook
The configuration is based on microsoft/DialoGPT - medium
. The dataset was generated from the 2018 OpenSubtitle downloaded from OPUS following these guidelines:
[Dataset Guidelines](https://github.com/PolyAI - LDN/conversational - datasets/tree/master/opensubtitles)
And this notebook was used for dataset generation:
Dataset Generation Notebook
đ Quick Start
⨠Features
- This is a DialogGPT model trained on French OpenSubtitles.
- It uses a custom tokenizer.
- The configuration is based on
microsoft/DialoGPT - medium
.
đĻ Installation
No specific installation steps are provided in the original document.
đģ Usage Examples
Basic Usage
import torch
from transformers import AutoTokenizer, AutoModelWithLMHead
tokenizer = AutoTokenizer.from_pretrained("cedpsam/chatbot_fr")
model = AutoModelWithLMHead.from_pretrained("cedpsam/chatbot_fr")
for step in range(6):
new_user_input_ids = tokenizer.encode(input(">> User:") + tokenizer.eos_token, return_tensors='pt')
bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1) if step > 0 else new_user_input_ids
chat_history_ids = model.generate(
bot_input_ids, max_length=1000,
pad_token_id=tokenizer.eos_token_id,
top_p=0.92, top_k = 50
)
print("DialoGPT: {}".format(tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)))
Advanced Usage
No advanced usage examples are provided in the original document.
đ Documentation
No detailed documentation other than the usage example is provided in the original document.
đ§ Technical Details
No specific technical details are provided in the original document.
đ License
No license information is provided in the original document.