StableVicuna-13B开源对话模型 - 经微调优化支持多种对话场景

首页

Stable Vicuna 13b Delta

由 CarperAI 开发

StableVicuna-13B是基于Vicuna-13B v0模型，通过人类反馈强化学习（RLHF）和近端策略优化（PPO）在多种对话和指令数据集上进行微调的产物。

大型语言模型

Transformers

英语#RLHF微调 #多轮对话优化 #指令跟随

下载量 31

发布时间 : 4/26/2023

模型简介

StableVicuna-13B是一个基于LLaMA transformer架构的自回归语言模型，专注于对话任务的文本生成。

模型特点

强化学习微调

通过人类反馈强化学习（RLHF）和近端策略优化（PPO）在多种对话和指令数据集上进行微调。

多数据集训练

在OpenAssistant、GPT4All和Alpaca等多个高质量数据集上进行训练。

对话优化

专注于对话任务的文本生成，能够生成连贯、有意义的对话响应。

模型能力

文本生成

对话系统

指令跟随

使用案例

对话系统

智能助手

用于构建智能助手，能够理解并回应用户的指令和问题。

文本生成

代码生成

根据用户指令生成Python脚本等代码片段。

🚀 StableVicuna-13B

StableVicuna-13B是一个基于多种对话和指令数据集，通过近端策略优化（PPO）进行基于人类反馈的强化学习（RLHF）微调的模型。它能有效处理对话任务，为用户提供高质量的文本生成服务。

🚀 快速开始

应用增量权重

StableVicuna-13B不能仅使用CarperAI/stable-vicuna-13b-delta的权重。要获得正确的模型，必须将LLaMA 13B和CarperAI/stable-vicuna-13b-delta权重之间的差异加回去。我们提供了apply_delta.py脚本以自动完成转换，你可以按以下方式运行：

python3 apply_delta.py --base /path/to/model_weights/llama-13b --target stable-vicuna-13b --delta CarperAI/stable-vicuna-13b-delta

开始使用

应用增量权重后，你可以使用transformers库开始与模型进行对话。根据Vicuna团队针对Vicuna v0的建议，你应该安装此版本的transformers：

pip install git+https://github.com/huggingface/transformers@c612628045822f909020f7eb6784c79700813eda

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("path/to/stable-vicuna-13b-applied")
model = AutoModelForCausalLM.from_pretrained("path/to/stable-vicuna-13b-applied")
model.half().cuda()

prompt = """\
### Human: Write a Python script for text classification using Transformers and PyTorch
### Assistant:\
"""

inputs = tokenizer(prompt, return_tensors='pt').to('cuda')
tokens = model.generate(
 **inputs,
 max_new_tokens=256,
 do_sample=True,
 temperature=1.0,
 top_p=1.0,
)
print(tokenizer.decode(tokens[0], skip_special_tokens=True))

✨ 主要特性

基于多种数据集微调：使用多个对话和指令数据集进行微调，提升了模型在对话任务中的表现。
强化学习优化：通过近端策略优化（PPO）进行基于人类反馈的强化学习（RLHF），使模型生成的文本更符合人类偏好。

📦 安装指南

安装步骤见上述“快速开始”部分，需先应用增量权重，再安装指定版本的transformers库。

💻 使用示例

基础用法

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("path/to/stable-vicuna-13b-applied")
model = AutoModelForCausalLM.from_pretrained("path/to/stable-vicuna-13b-applied")
model.half().cuda()

prompt = """\
### Human: Write a Python script for text classification using Transformers and PyTorch
### Assistant:\
"""

inputs = tokenizer(prompt, return_tensors='pt').to('cuda')
tokens = model.generate(
 **inputs,
 max_new_tokens=256,
 do_sample=True,
 temperature=1.0,
 top_p=1.0,
)
print(tokenizer.decode(tokens[0], skip_special_tokens=True))

📚 详细文档

模型详情

属性	详情
训练者	Duy Phung of CarperAI
模型类型	StableVicuna-13B是一个基于LLaMA变压器架构的自回归语言模型
语言	英文
库	trlX
增量权重许可证	CC-BY-NC-SA-4.0
基础LLaMA模型权重许可证	Meta的非商业定制许可证
联系信息	有关模型的问题和评论，请访问CarperAI和StableFoundation的Discord服务器

超参数	值
\(n_\text{parameters}\)	13B
\(d_\text{model}\)	5120
\(n_\text{layers}\)	40
\(n_\text{heads}\)	40

训练

训练数据集

StableVicuna-13B在三个数据集的混合数据上进行微调：

OpenAssistant Conversations Dataset (OASST1)：一个由人类生成、人类标注的助手式对话语料库，包含161,443条消息，分布在66,497个对话树中，使用35种不同语言。
GPT4All Prompt Generations：一个由GPT - 4生成的包含400k提示和响应的数据集。
Alpaca：一个由OpenAI的text - davinci - 003引擎生成的包含52,000条指令和演示的数据集。

在基于人类反馈的强化学习（RLHF）期间使用的奖励模型也在OpenAssistant Conversations Dataset (OASST1)以及另外两个数据集上进行训练：

Anthropic HH - RLHF：一个关于AI助手有用性和无害性偏好的数据集。
Stanford Human Preferences Dataset：一个包含385K条人类对18个不同主题领域（从烹饪到法律建议）的问题/指令响应的集体偏好的数据集。

训练过程

CarperAI/stable-vicuna-13b-delta使用trlX中实现的PPO进行训练，配置如下：

超参数	值
num_rollouts	128
chunk_size	16
ppo_epochs	4
init_kl_coef	0.1
target	6
horizon	10000
gamma	1
lam	0.95
cliprange	0.2
cliprange_value	0.2
vf_coef	1.0
scale_reward	None
cliprange_reward	10
generation_kwargs
max_length	512
min_length	48
top_k	0.0
top_p	1.0
do_sample	True
temperature	1.0

使用和限制

预期用途

该模型旨在用于文本生成，尤其专注于对话任务。用户可以根据非商业许可证在自己的数据上进一步微调模型，以提高模型在特定任务上的性能。

限制和偏差

基础LLaMA模型在各种数据上进行训练，其中一些数据可能包含冒犯性、有害和有偏差的内容，这可能导致模型产生不良行为。请参阅LLaMA 论文的第5.1节。我们尚未进行任何研究来确定在上述数据集上进行微调如何影响模型的行为和毒性。请勿将此模型的聊天响应视为人类判断的替代品或事实来源，请谨慎使用。

🔧 技术细节

StableVicuna-13B基于Vicuna-13B v0模型，通过近端策略优化（PPO）进行基于人类反馈的强化学习（RLHF）微调。在训练过程中，使用了多个对话和指令数据集，以提升模型在对话任务中的表现。同时，奖励模型也在多个相关数据集上进行训练，以确保模型生成的文本更符合人类偏好。

📄 许可证

增量权重许可证为CC-BY-NC-SA-4.0，基础LLaMA模型权重许可证为Meta的非商业定制许可证。

致谢

如果没有Stability AI的支持，这项工作将无法完成。

引用

@article{touvron2023llama,
  title={LLaMA: Open and Efficient Foundation Language Models},
  author={Touvron, Hugo and Lavril, Thibaut and Izacard, Gautier and Martinet, Xavier and Lachaux, Marie-Anne and Lacroix, Timoth{\'e}e and Rozi{\`e}re, Baptiste and Goyal, Naman and Hambro, Eric and Azhar, Faisal and Rodriguez, Aurelien and Joulin, Armand and Grave, Edouard and Lample, Guillaume},
  journal={arXiv preprint arXiv:2302.13971},
  year={2023}
}

@misc{vicuna2023,
    title = {Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality},
    url = {https://vicuna.lmsys.org},
    author = {Chiang, Wei-Lin and Li, Zhuohan and Lin, Zi and Sheng, Ying and Wu, Zhanghao and Zhang, Hao and Zheng, Lianmin and Zhuang, Siyuan and Zhuang, Yonghao and Gonzalez, Joseph E. and Stoica, Ion and Xing, Eric P.},
    month = {March},
    year = {2023}
}

@misc{gpt4all,
  author = {Yuvanesh Anand and Zach Nussbaum and Brandon Duderstadt and Benjamin Schmidt and Andriy Mulyar},
  title = {GPT4All: Training an Assistant-style Chatbot with Large Scale Data Distillation from GPT-3.5-Turbo},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/nomic-ai/gpt4all}},
}

@misc{alpaca,
  author = {Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto },
  title = {Stanford Alpaca: An Instruction-following LLaMA model},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/tatsu-lab/stanford_alpaca}},
}

@software{leandro_von_werra_2023_7790115,
  author       = {Leandro von Werra and
                  Alex Havrilla and
                  Max reciprocated and
                  Jonathan Tow and
                  Aman cat-state and
                  Duy V. Phung and
                  Louis Castricato and
                  Shahbuland Matiana and
                  Alan and
                  Ayush Thakur and
                  Alexey Bukhtiyarov and
                  aaronrmm and
                  Fabrizio Milo and
                  Daniel and
                  Daniel King and
                  Dong Shin and
                  Ethan Kim and
                  Justin Wei and
                  Manuel Romero and
                  Nicky Pochinkov and
                  Omar Sanseviero and
                  Reshinth Adithyan and
                  Sherman Siu and
                  Thomas Simonini and
                  Vladimir Blagojevic and
                  Xu Song and
                  Zack Witten and
                  alexandremuzio and
                  crumb},
  title        = {{CarperAI/trlx: v0.6.0: LLaMa (Alpaca), Benchmark 
                   Util, T5 ILQL, Tests}},
  month        = mar,
  year         = 2023,
  publisher    = {Zenodo},
  version      = {v0.6.0},
  doi          = {10.5281/zenodo.7790115},
  url          = {https://doi.org/10.5281/zenodo.7790115}
}