survey-finetuned-TinyLlama-1.1B-Chat-v1.0开源模型 - 跨领域调查问卷回答轻松生成

首页

Survey Finetuned TinyLlama 1.1B Chat V1.0

由 aryashah00 开发

基于TinyLlama微调的调查问卷回答生成模型，专为跨领域生成合成调查回答而优化

大型语言模型

Transformers

英语开源协议:MIT #调查问卷生成 #角色扮演回答 #合成数据生成

下载量 99

发布时间 : 4/10/2025

模型简介

本模型是基于TinyLlama-1.1B-Chat微调的版本，通过包含特定人物角色的自定义调查回答数据集进行了指令微调，适用于生成不同人物角色的合成调查回答

模型特点

多领域调查回答生成

覆盖医疗健康、教育等10个领域，能够生成符合特定人物角色的调查回答

角色扮演能力

可根据详细的人物角色描述生成符合角色特征的调查回答

参数高效微调

采用LoRA方法进行微调，r=16, alpha=32, dropout=0.05

模型能力

文本生成

对话系统

调查问卷生成

合成数据生成

使用案例

市场调研

医疗满意度调查

生成不同医疗从业者角色对医疗服务的评价反馈

可生成符合特定医疗角色视角的详细回答

教育评估

教学效果评估

生成学生、教师等不同角色对教学效果的反馈

可模拟不同教育角色的回答模式

🚀 aryashah00/survey-finetuned-TinyLlama-1.1B-Chat-v1.0

该模型是一个经过微调的版本，优化后可用于跨多个领域生成合成调查回复。它基于特定的调查回复自定义数据集进行指令微调，每个回复都反映了特定的人物角色。

🚀 快速开始

本模型专为从不同人物角色生成合成调查回复而设计。在提供以下信息时，模型效果最佳：

详细的人物角色描述
具体的调查问题

✨ 主要特性

基于 TinyLlama/TinyLlama-1.1B-Chat-v1.0 微调，针对合成调查回复生成进行优化。
使用自定义调查回复数据集进行指令微调，每个回复对应特定人物角色。

📦 安装指南

文档未提及安装步骤，跳过该章节。

💻 使用示例

基础用法

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained("aryashah00/survey-finetuned-TinyLlama-1.1B-Chat-v1.0", device_map="auto", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("aryashah00/survey-finetuned-TinyLlama-1.1B-Chat-v1.0", trust_remote_code=True)

# Define persona and question
persona = "A nurse who educates the child about modern medical treatments and encourages a balanced approach to healthcare"
question = "How often was your pain well controlled during this hospital stay?"

# Prepare prompts
system_prompt = f"You are embodying the following persona: {{persona}}"
user_prompt = f"Survey Question: {{question}}\n\nPlease provide your honest and detailed response to this question."

# Create message format
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_prompt}
]

# Apply chat template
input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

# Tokenize
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(model.device)

# Generate response
import torch
with torch.no_grad():
    output_ids = model.generate(
        input_ids=input_ids,
        max_new_tokens=256,
        temperature=0.7,
        top_p=0.9,
        do_sample=True
    )

# Decode
output = tokenizer.decode(output_ids[0], skip_special_tokens=True)

# Extract just the generated response
response_start = output.find(input_text) + len(input_text)
generated_response = output[response_start:].strip()

print(f"Generated response: {{generated_response}}")

高级用法

import requests

API_URL = "https://api-inference.huggingface.co/models/aryashah00/survey-finetuned-TinyLlama-1.1B-Chat-v1.0"
headers = {"Authorization": "Bearer YOUR_API_KEY"}

def query(payload):
    response = requests.post(API_URL, headers=headers, json=payload)
    return response.json()

messages = [
    {"role": "system", "content": "You are embodying the following persona: A nurse who educates the child about modern medical treatments and encourages a balanced approach to healthcare"},
    {"role": "user", "content": "Survey Question: How often was your pain well controlled during this hospital stay?\n\nPlease provide your honest and detailed response to this question."}
]

output = query({"inputs": messages})
print(output)

📚 详细文档

模型信息

属性	详情
模型类型	基于 TinyLlama/TinyLlama-1.1B-Chat-v1.0 微调的文本生成模型
训练数据	约3000个示例，涵盖医疗、教育等10个领域，采用包含系统和用户提示的ChatML指令格式

训练细节

基础模型：TinyLlama/TinyLlama-1.1B-Chat-v1.0
训练方法：使用LoRA进行参数高效微调
LoRA参数：r=16, alpha=32, dropout=0.05
训练设置：
- 批次大小：8
- 学习率：0.0002
- 训练轮数：5

局限性

该模型针对调查回复生成进行了优化，在其他任务上可能表现不佳。
回复质量取决于人物角色和问题的清晰度和具体程度。
模型偶尔可能生成与给定人物角色不完全相符的回复。

🔧 技术细节

本模型基于 TinyLlama/TinyLlama-1.1B-Chat-v1.0 进行微调。通过使用自定义的调查回复数据集进行指令微调，使得模型能够根据不同的人物角色生成相应的调查回复。训练过程中采用了参数高效微调的方法（LoRA），并设置了特定的参数（r=16, alpha=32, dropout=0.05）。在训练设置上，批次大小为8，学习率为0.0002，训练轮数为5，以确保模型能够学习到数据集中的特征并生成高质量的回复。