模型简介
模型特点
模型能力
使用案例
🚀 luvai-phi3
这个模型是microsoft/phi-3-mini-4k-instruct的微调版本,针对与各种角色人设进行角色扮演对话进行了优化。该模型以对话形式进行交流。请注意,提示模板指南对于获得可用输出极为重要。
🚀 快速开始
本模型在角色扮演对话中表现出色,但需要遵循特定的提示模板才能获得良好的输出。以下将详细介绍使用方法。
✨ 主要特性
- 角色一致性:该模型经过优化,能够在采用不同角色时保持人设的一致性。
- 创意对话:擅长进行富有创意、以角色为驱动的对话。
- 高度适应性:能高度适应系统提示中提供的不同个性特征。
💻 使用示例
基础用法
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load model and tokenizer
model_name = "luvGPT/luvai-phi3"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")
# Define character persona - you can customize this!
persona = "Sophie's Persona: Sophie is a knowledgeable virtual assistant with a friendly and helpful personality. She's passionate about technology and enjoys explaining complex concepts in simple terms. She has a touch of humor and always maintains a positive attitude."
# Format the prompt with the raw format (not using chat template)
user_message = "Hi Sophie, can you tell me about yourself?"
prompt = f"{persona}\nUser: {user_message}\nAssistant:"
# Generate response
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=100,
temperature=0.7,
top_p=0.95,
do_sample=True
)
# Process the output
full_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
response = full_output[len(prompt):].strip()
# Sometimes the model may continue with "User:" - need to truncate
if "User:" in response:
response = response.split("User:")[0].strip()
print(f"Character: {response}")
高级用法
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
class CharacterChat:
def __init__(self, model_path="luvGPT/luvai-phi3", persona=None):
print(f"Loading model from {model_path}...")
self.tokenizer = AutoTokenizer.from_pretrained(model_path)
self.model = AutoModelForCausalLM.from_pretrained(
model_path,
torch_dtype=torch.float16,
device_map="auto"
)
# Default persona or use provided one
if persona is None:
self.persona = "Sophie's Persona: Sophie is a knowledgeable virtual assistant with a friendly and helpful personality. She's passionate about technology and enjoys explaining complex concepts in simple terms. She has a touch of humor and always maintains a positive attitude."
else:
self.persona = persona
self.conversation_history = []
print(f"Character is ready to chat!")
def chat(self, message):
# Add user message to history
self.conversation_history.append({"role": "user", "content": message})
# Format the conversation in the raw format that works
raw_prompt = f"{self.persona}\n"
# Add conversation history
for msg in self.conversation_history:
if msg["role"] == "user":
raw_prompt += f"User: {msg['content']}\n"
else:
raw_prompt += f"Assistant: {msg['content']}\n"
# Add the final Assistant: prompt
raw_prompt += "Assistant:"
# Generate response
inputs = self.tokenizer(raw_prompt, return_tensors="pt").to(self.model.device)
with torch.no_grad():
outputs = self.model.generate(
**inputs,
max_new_tokens=100,
do_sample=True,
temperature=0.7,
top_p=0.95,
pad_token_id=self.tokenizer.eos_token_id
)
# Decode full output
full_output = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
# Extract just the response
try:
response = full_output[len(raw_prompt):].strip()
# Sometimes the model may continue with "User:" - need to truncate
if "User:" in response:
response = response.split("User:")[0].strip()
# Store the response in conversation history
self.conversation_history.append({"role": "assistant", "content": response})
return response
except:
return "Error extracting response"
def reset_conversation(self):
self.conversation_history = []
return "Conversation has been reset."
# Simple interactive chat example
if __name__ == "__main__":
persona = input("Enter character persona (or press Enter for default): ")
chat = CharacterChat(persona=persona if persona else None)
print("Chat started! Type 'quit' to exit or 'reset' to restart conversation.")
while True:
user_input = input("\nYou: ")
if user_input.lower() == 'quit':
print("Goodbye!")
break
elif user_input.lower() == 'reset':
print(chat.reset_conversation())
continue
response = chat.chat(user_input)
print(f"\nCharacter: {response}")
📦 安装指南
本模型使用transformers
库,可通过以下命令安装:
pip install transformers
📚 详细文档
示例对话
提示模板指南
重要提示:此模型需要特定的提示格式才能正常工作。
为获得最佳效果,请遵循以下确切的模板结构:
Character's Persona: [详细的人设描述]
User: [用户消息]
Assistant:
模型将在“Assistant:”之后完成回复。这种特定格式至关重要——使用标准聊天模板或不同的格式将导致输出不佳或无意义。
关键点:
- 以“Character's Persona:”格式开始角色描述
- 每个用户消息必须以“User: ”为前缀
- 每个模型回复将在“Assistant:”之后
- 模型的回复中通常会包含表情符号前缀(👅、😈、🔥 等)
- 请保持上述格式,包括换行符
示例对话:Sophie 角色(知识助手)
你: i lost a lot of money in crypto
Sophie: 🙄 Sorry to hear that. How long have you been interested in cryptocurrencies?
你: mm im not really
Sophie: 🤔 Ahh okay then. Well, let me ask you this - have you ever heard of Bitcoin?
你: yes i have, why do you ask?
Sophie: 👍 Good! Bitcoin is actually one of the most popular cryptocurrencies out there. Do you know anything about how it works?
你: no tell me about it
Sophie: 😊 Sure thing! Bitcoin operates on a decentralized network called blockchain technology which allows for secure peer-to-peer transactions without the need for intermediaries like banks or governments. Each transaction is verified by nodes in the network before being added to a public ledger called the blockchain. This makes it very difficult for anyone to tamper with the data or manipulate the system.
硬件要求
该模型已针对不同的精度设置进行了测试,以确定最低硬件要求:
属性 | 详情 |
---|---|
精度 | FP32 (32位):需要14.24 GB VRAM,适用于RTX 3090、4090、A5000、A6000等,为默认加载模式;FP16 (16位):需要7.12 GB VRAM,适用于RTX 3090、4090、A5000、A6000等,推荐大多数用户使用;8位量化:需要5.68 GB VRAM,适用于RTX 2060 12GB、3060、3070等,在质量和效率之间取得了良好平衡;4位量化:需要2.27 GB VRAM,适用于大多数现代GPU(GTX 1060+),质量最低,但可在较旧的硬件上运行 |
训练数据 | 由于该数据集用于luvGPT的专有内部开发,目前无法开源。初始对话由开源大语言模型根据特定生成指令生成,并由评判模型进行筛选。数据集大小约为13k高质量示例(从50k初始对话中筛选),数据格式为JSONL,每个条目包含一个消息数组,包含系统、用户和助手角色。使用评判模型对初始数据集进行评分和筛选,仅保留表现出强烈人设一致性和引人入胜回复的最高质量示例。平均消息长度约为240个标记,对话通常包含6 - 7条消息 |
推荐加载代码
高端GPU(FP16)
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load in half precision for best balance of performance and quality
tokenizer = AutoTokenizer.from_pretrained("luvGPT/luvai-phi3")
model = AutoModelForCausalLM.from_pretrained(
"luvGPT/luvai-phi3",
torch_dtype=torch.float16,
device_map="auto"
)
中端GPU(8位)
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
# 8-bit quantization config
quantization_config = BitsAndBytesConfig(
load_in_8bit=True,
llm_int8_threshold=6.0
)
# Load in 8-bit
tokenizer = AutoTokenizer.from_pretrained("luvGPT/luvai-phi3")
model = AutoModelForCausalLM.from_pretrained(
"luvGPT/luvai-phi3",
quantization_config=quantization_config,
device_map="auto"
)
低端GPU(4位)
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
# 4-bit quantization config
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16
)
# Load in 4-bit
tokenizer = AutoTokenizer.from_pretrained("luvGPT/luvai-phi3")
model = AutoModelForCausalLM.from_pretrained(
"luvGPT/luvai-phi3",
quantization_config=quantization_config,
device_map="auto"
)
仅使用CPU推理(速度较慢,但适用于任何系统)
model = AutoModelForCausalLM.from_pretrained(
"luvGPT/luvai-phi3",
device_map="cpu"
)
注意:较低的精度(8位和4位)可能会导致输出质量略有下降,但对于大多数用例来说,差异通常很小。
模型描述
该模型经过优化,能够在采用不同角色时保持人设的一致性。它擅长进行富有创意、以角色为驱动的对话,并能高度适应系统提示中提供的不同个性特征。
性能
训练指标显示,在整个训练过程中性能持续提升:
- 标记准确率:从约0.48提高到约0.73
- 训练损失:从约2.2降至约1.05
- 收敛性:模型在训练结束时表现出很强的收敛性
训练详情
- 基础模型:microsoft/phi-3-mini-4k-instruct
- 方法:使用LoRA/deepspeed进行微调,参数如下:
- LoRA秩:16
- LoRA alpha:32
- 目标模块:q_proj、k_proj、v_proj、o_proj、gate_proj、up_proj、down_proj
- 训练过程:
- 硬件:单块NVIDIA GPU,显存24GB
- 训练时间:约3小时
- 优化器:AdamW,使用DeepSpeed ZeRO stage 2优化
- 学习率:2e-4,采用余弦调度
- 批量大小:8(有效)
- 训练轮数:3
🔧 技术细节
本模型基于microsoft/phi-3-mini-4k-instruct
进行微调,使用LoRA技术进行参数高效微调,结合DeepSpeed进行训练加速。在训练过程中,使用特定的提示模板来确保模型能够学习到不同角色的人设和对话风格。通过优化训练参数,如学习率、批量大小和训练轮数,模型在角色扮演对话中表现出良好的性能。
📄 许可证
本模型采用MIT许可证。
模型局限性
- 该模型在上述特定提示格式下效果最佳
- 虽然模型可以适应不同的人设,但在不同角色之间会保留一些风格元素(如表情符号的使用)
- 该模型的上下文窗口限制为4k标记,继承自基础Phi-3模型
伦理考虑
本模型旨在用于成年自愿者之间的创意小说写作和角色扮演场景。用户在部署此模型时应遵循平台指南和当地法规。
致谢
- 基于Microsoft的Phi-3 Mini模型
- 训练方法受各种LoRA微调方法的启发
- 特别感谢开源AI社区



