luvai-phi3開源對話模型 - 專為角色扮演打造，實現生動交互體驗

首頁

Luvai Phi3

由luvGPT開發

基於microsoft/phi-3-mini-4k-instruct微調優化的對話模型，專為角色扮演場景設計，需要特定提示格式以獲得最佳效果

大型語言模型

Transformers

英語開源協議:MIT #角色扮演優化 #LoRA微調 #對話式生成

下載量 98

發布時間 : 3/18/2025

模型概述

該模型經過優化，能在保持角色一致性的同時適應不同角色設定，擅長創意性、角色驅動的對話，並展現出對系統提示中不同人格特徵的高度適應能力

模型特點

角色扮演優化

專為與各種角色人物進行角色扮演對話而設計，能保持角色一致性

特定提示格式

需要嚴格遵循特定提示模板才能獲得最佳輸出效果

多精度支持

支持從FP32到4位量化的多種精度設置，適應不同硬件環境

表情豐富回覆

模型回覆常包含表情前綴，增強對話表現力

模型能力

角色扮演對話

創意文本生成

多角色適應

長對話保持一致性

使用案例

娛樂

虛擬角色互動

與設定好的虛擬角色進行沉浸式對話

能根據角色設定生成符合人物性格的回覆

教育

知識講解助手

以特定角色身份解釋複雜概念

能用簡單術語解釋技術概念，如示例中的比特幣解釋

🚀 luvai-phi3

這個模型是microsoft/phi-3-mini-4k-instruct的微調版本，針對與各種角色人設進行角色扮演對話進行了優化。該模型以對話形式進行交流。請注意，提示模板指南對於獲得可用輸出極為重要。

Header Image

🚀 快速開始

本模型在角色扮演對話中表現出色，但需要遵循特定的提示模板才能獲得良好的輸出。以下將詳細介紹使用方法。

✨ 主要特性

角色一致性：該模型經過優化，能夠在採用不同角色時保持人設的一致性。
創意對話：擅長進行富有創意、以角色為驅動的對話。
高度適應性：能高度適應系統提示中提供的不同個性特徵。

💻 使用示例

基礎用法

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model_name = "luvGPT/luvai-phi3"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")

# Define character persona - you can customize this!
persona = "Sophie's Persona: Sophie is a knowledgeable virtual assistant with a friendly and helpful personality. She's passionate about technology and enjoys explaining complex concepts in simple terms. She has a touch of humor and always maintains a positive attitude."

# Format the prompt with the raw format (not using chat template)
user_message = "Hi Sophie, can you tell me about yourself?"
prompt = f"{persona}\nUser: {user_message}\nAssistant:"

# Generate response
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=100,
    temperature=0.7,
    top_p=0.95,
    do_sample=True
)

# Process the output
full_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
response = full_output[len(prompt):].strip()

# Sometimes the model may continue with "User:" - need to truncate
if "User:" in response:
    response = response.split("User:")[0].strip()

print(f"Character: {response}")

高級用法

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

class CharacterChat:
    def __init__(self, model_path="luvGPT/luvai-phi3", persona=None):
        print(f"Loading model from {model_path}...")
        self.tokenizer = AutoTokenizer.from_pretrained(model_path)
        self.model = AutoModelForCausalLM.from_pretrained(
            model_path,
            torch_dtype=torch.float16,
            device_map="auto"
        )
        
        # Default persona or use provided one
        if persona is None:
            self.persona = "Sophie's Persona: Sophie is a knowledgeable virtual assistant with a friendly and helpful personality. She's passionate about technology and enjoys explaining complex concepts in simple terms. She has a touch of humor and always maintains a positive attitude."
        else:
            self.persona = persona
            
        self.conversation_history = []
        print(f"Character is ready to chat!")
        
    def chat(self, message):
        # Add user message to history
        self.conversation_history.append({"role": "user", "content": message})
        
        # Format the conversation in the raw format that works
        raw_prompt = f"{self.persona}\n"
        
        # Add conversation history
        for msg in self.conversation_history:
            if msg["role"] == "user":
                raw_prompt += f"User: {msg['content']}\n"
            else:
                raw_prompt += f"Assistant: {msg['content']}\n"
        
        # Add the final Assistant: prompt
        raw_prompt += "Assistant:"
        
        # Generate response
        inputs = self.tokenizer(raw_prompt, return_tensors="pt").to(self.model.device)
        
        with torch.no_grad():
            outputs = self.model.generate(
                **inputs,
                max_new_tokens=100,
                do_sample=True,
                temperature=0.7,
                top_p=0.95,
                pad_token_id=self.tokenizer.eos_token_id
            )
        
        # Decode full output
        full_output = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
        
        # Extract just the response
        try:
            response = full_output[len(raw_prompt):].strip()
            
            # Sometimes the model may continue with "User:" - need to truncate
            if "User:" in response:
                response = response.split("User:")[0].strip()
                
            # Store the response in conversation history
            self.conversation_history.append({"role": "assistant", "content": response})
            
            return response
        except:
            return "Error extracting response"
    
    def reset_conversation(self):
        self.conversation_history = []
        return "Conversation has been reset."

# Simple interactive chat example
if __name__ == "__main__":
    persona = input("Enter character persona (or press Enter for default): ")
    chat = CharacterChat(persona=persona if persona else None)
    
    print("Chat started! Type 'quit' to exit or 'reset' to restart conversation.")
    
    while True:
        user_input = input("\nYou: ")
        
        if user_input.lower() == 'quit':
            print("Goodbye!")
            break
        elif user_input.lower() == 'reset':
            print(chat.reset_conversation())
            continue
            
        response = chat.chat(user_input)
        print(f"\nCharacter: {response}")

📦 安裝指南

本模型使用transformers庫，可通過以下命令安裝：

pip install transformers

📚 詳細文檔

示例對話

提示模板指南

重要提示：此模型需要特定的提示格式才能正常工作。

為獲得最佳效果，請遵循以下確切的模板結構：

Character's Persona: [詳細的人設描述]
User: [用戶消息]
Assistant:

模型將在“Assistant:”之後完成回覆。這種特定格式至關重要——使用標準聊天模板或不同的格式將導致輸出不佳或無意義。

關鍵點：

以“Character's Persona:”格式開始角色描述
每個用戶消息必須以“User: ”為前綴
每個模型回覆將在“Assistant:”之後
模型的回覆中通常會包含表情符號前綴（👅、😈、🔥 等）
請保持上述格式，包括換行符

示例對話：Sophie 角色（知識助手）

你： i lost a lot of money in crypto
Sophie： 🙄 Sorry to hear that. How long have you been interested in cryptocurrencies?

你： mm im not really
Sophie： 🤔 Ahh okay then. Well, let me ask you this - have you ever heard of Bitcoin?

你： yes i have, why do you ask?
Sophie： 👍 Good! Bitcoin is actually one of the most popular cryptocurrencies out there. Do you know anything about how it works?

你： no tell me about it
Sophie： 😊 Sure thing! Bitcoin operates on a decentralized network called blockchain technology which allows for secure peer-to-peer transactions without the need for intermediaries like banks or governments. Each transaction is verified by nodes in the network before being added to a public ledger called the blockchain. This makes it very difficult for anyone to tamper with the data or manipulate the system.

硬件要求

該模型已針對不同的精度設置進行了測試，以確定最低硬件要求：

屬性	詳情
精度	FP32 (32位)：需要14.24 GB VRAM，適用於RTX 3090、4090、A5000、A6000等，為默認加載模式；FP16 (16位)：需要7.12 GB VRAM，適用於RTX 3090、4090、A5000、A6000等，推薦大多數用戶使用；8位量化：需要5.68 GB VRAM，適用於RTX 2060 12GB、3060、3070等，在質量和效率之間取得了良好平衡；4位量化：需要2.27 GB VRAM，適用於大多數現代GPU（GTX 1060+），質量最低，但可在較舊的硬件上運行
訓練數據	由於該數據集用於luvGPT的專有內部開發，目前無法開源。初始對話由開源大語言模型根據特定生成指令生成，並由評判模型進行篩選。數據集大小約為13k高質量示例（從50k初始對話中篩選），數據格式為JSONL，每個條目包含一個消息數組，包含系統、用戶和助手角色。使用評判模型對初始數據集進行評分和篩選，僅保留表現出強烈人設一致性和引人入勝回覆的最高質量示例。平均消息長度約為240個標記，對話通常包含6 - 7條消息

屬性

詳情

精度

FP32 (32位)：需要14.24 GB VRAM，適用於RTX 3090、4090、A5000、A6000等，為默認加載模式；FP16 (16位)：需要7.12 GB VRAM，適用於RTX 3090、4090、A5000、A6000等，推薦大多數用戶使用；8位量化：需要5.68 GB VRAM，適用於RTX 2060 12GB、3060、3070等，在質量和效率之間取得了良好平衡；4位量化：需要2.27 GB VRAM，適用於大多數現代GPU（GTX 1060+），質量最低，但可在較舊的硬件上運行

訓練數據

由於該數據集用於luvGPT的專有內部開發，目前無法開源。初始對話由開源大語言模型根據特定生成指令生成，並由評判模型進行篩選。數據集大小約為13k高質量示例（從50k初始對話中篩選），數據格式為JSONL，每個條目包含一個消息數組，包含系統、用戶和助手角色。使用評判模型對初始數據集進行評分和篩選，僅保留表現出強烈人設一致性和引人入勝回覆的最高質量示例。平均消息長度約為240個標記，對話通常包含6 - 7條消息

模型描述

該模型經過優化，能夠在採用不同角色時保持人設的一致性。它擅長進行富有創意、以角色為驅動的對話，並能高度適應系統提示中提供的不同個性特徵。

性能

訓練指標顯示，在整個訓練過程中性能持續提升： Training Loss Token Accuracy

標記準確率：從約0.48提高到約0.73
訓練損失：從約2.2降至約1.05
收斂性：模型在訓練結束時表現出很強的收斂性

訓練詳情

基礎模型：microsoft/phi-3-mini-4k-instruct
方法：使用LoRA/deepspeed進行微調，參數如下：
- LoRA秩：16
- LoRA alpha：32
- 目標模塊：q_proj、k_proj、v_proj、o_proj、gate_proj、up_proj、down_proj
訓練過程：
- 硬件：單塊NVIDIA GPU，顯存24GB
- 訓練時間：約3小時
- 優化器：AdamW，使用DeepSpeed ZeRO stage 2優化
- 學習率：2e-4，採用餘弦調度
- 批量大小：8（有效）
- 訓練輪數：3

🔧 技術細節

本模型基於microsoft/phi-3-mini-4k-instruct進行微調，使用LoRA技術進行參數高效微調，結合DeepSpeed進行訓練加速。在訓練過程中，使用特定的提示模板來確保模型能夠學習到不同角色的人設和對話風格。通過優化訓練參數，如學習率、批量大小和訓練輪數，模型在角色扮演對話中表現出良好的性能。