🚀 Optimized ChatBot for Anime Roleplay
This project is an optimized chatbot designed for anime role - play. It uses the Mistral model to generate responses in an anime - themed conversation. The chatbot can handle long - term conversations and is optimized for GPU usage.
🚀 Quick Start
To start using the chatbot, make sure you have a GPU available as talking to the model requires GPU support.
Prerequisites
- Python environment
- GPU with CUDA support
Installation
The necessary libraries can be installed by ensuring the following packages are available in your Python environment:
transformers
torch
bitsandbytes
(for quantization)
logging
queue
threading
time
traceback
os
gc
Running the Chatbot
import os, torch, gc, threading, time, traceback
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, TextIteratorStreamer
from queue import Queue, Empty
import logging
os.environ["TOKENIZERS_PARALLELISM"] = "false"
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:128"
torch.backends.cudnn.benchmark = True
torch.backends.cuda.matmul.allow_tf32 = True
torch.set_float32_matmul_precision("high")
logging.getLogger("transformers").setLevel(logging.ERROR)
BOT_NAME = "Senko"
PROMPT_FILE = "instructions_prompt.txt"
MODEL_ID = "senko-sleepy-fox/mistral-anime-ai"
RESPONSE_TIMEOUT = 300
MAX_CONTEXT_LENGTH = 10240
MAX_NEW_TOKENS = 8192
MEMORY_SIZE = 20
def main():
bot = OptimizedChatBot()
try:
print("Initializing chatbot...")
bot.load_system_prompt(BOT_NAME)
bot.load_model()
print(f"\n{'='*50}")
print(f"{BOT_NAME} is ready! (Unlimited response length)")
print("Commands:")
print(" 'exit' - Quit the program")
print(" 'clear' - Reset conversation memory")
print(" 'memory' - Show memory usage")
print(" 'status' - Show bot status")
print(f"{'='*50}\n")
conversation_count = 0
while True:
try:
user_input = input("You: ").strip()
if user_input.lower() == "exit":
print("Goodbye! üëã")
break
elif user_input.lower() == "clear":
bot.memory = []
print("‚úÖ Conversation memory cleared.")
continue
elif user_input.lower() == "memory":
print(f"üìä {bot.get_memory_info()}")
continue
elif user_input.lower() == "status":
status = "üü¢ Ready" if not bot.is_generating else "üü° Generating"
print(f"Status: {status}")
print(f"Conversation turns: {len([t for t in bot.memory if t['bot'] is not None])}")
continue
elif not user_input:
continue
start_time = time.time()
prompt = bot.prepare_prompt(user_input)
response = bot.generate_reply_with_timeout(prompt)
if response:
response_time = time.time() - start_time
print(f"[⏱️ {response_time:.2f}s]")
else:
print("‚ùå Failed to generate response. Try again or type 'clear' to reset.")
conversation_count += 1
if conversation_count % 10 == 0:
print("[üßπ Cleaning up memory...]")
bot.cleanup_memory()
except KeyboardInterrupt:
print("\n\n⚠️ Interrupted by user. Exiting gracefully...")
break
except Exception as e:
print(f"\n‚ùå Conversation error: {e}")
traceback.print_exc()
print("Continuing... (type 'exit' to quit)")
except Exception as e:
print(f"üí• Startup error: {e}")
traceback.print_exc()
finally:
print("\nüßπ Performing final cleanup...")
if torch.cuda.is_available():
torch.cuda.empty_cache()
torch.cuda.synchronize()
gc.collect()
print("‚úÖ Cleanup completed. Goodbye!")
if __name__ == "__main__":
torch.cuda.empty_cache()
import gc
gc.collect()
main()
✨ Features
- Anime - Themed Role - play: The chatbot is designed to role - play as an anime character, providing emotionally - supportive responses.
- Long - Term Memory: It can handle long - term conversations by maintaining a conversation history.
- GPU Optimization: Optimized for GPU usage with quantization support to reduce memory consumption.
- Timeout Handling: Supports a timeout mechanism for response generation to avoid long - running processes.
📦 Installation
The installation mainly involves setting up the Python environment and installing the required libraries. You can use pip
to install the necessary packages:
pip install transformers torch bitsandbytes
💻 Usage Examples
Basic Usage
bot = OptimizedChatBot()
bot.load_system_prompt(BOT_NAME)
bot.load_model()
user_input = "Hello, Senko!"
prompt = bot.prepare_prompt(user_input)
response = bot.generate_reply_with_timeout(prompt)
if response:
print(response)
Advanced Usage
bot = OptimizedChatBot()
bot.load_system_prompt(BOT_NAME)
bot.load_model()
conversation_count = 0
while True:
user_input = input("You: ").strip()
if user_input.lower() == "exit":
break
elif user_input.lower() == "clear":
bot.memory = []
continue
elif user_input.lower() == "memory":
print(bot.get_memory_info())
continue
elif user_input.lower() == "status":
status = "Ready" if not bot.is_generating else "Generating"
print(f"Status: {status}")
print(f"Conversation turns: {len([t for t in bot.memory if t['bot'] is not None])}")
continue
elif not user_input:
continue
prompt = bot.prepare_prompt(user_input)
response = bot.generate_reply_with_timeout(prompt)
if response:
print(response)
conversation_count += 1
if conversation_count % 10 == 0:
bot.cleanup_memory()
🔧 Technical Details
- Model Loading: The chatbot uses the
AutoTokenizer
and AutoModelForCausalLM
from the transformers
library to load the model and tokenizer. It supports both 4 - bit and 8 - bit quantization for GPU usage.
- Prompt Preparation: The chatbot maintains a conversation history in memory and prepares the prompt based on the user input and the conversation history.
- Response Generation: The response is generated using the
generate
method of the model with a streaming mechanism. It also has a timeout mechanism to handle long - running processes.
📄 License
This project is licensed under the Apache - 2.0 license.