StableVicuna-13B開源對話模型 - 經微調優化支持多種對話場景

首頁

Stable Vicuna 13b Delta

由CarperAI開發

StableVicuna-13B是基於Vicuna-13B v0模型，通過人類反饋強化學習（RLHF）和近端策略優化（PPO）在多種對話和指令數據集上進行微調的產物。

大型語言模型

Transformers

英語#RLHF微調 #多輪對話優化 #指令跟隨

下載量 31

發布時間 : 4/26/2023

模型概述

StableVicuna-13B是一個基於LLaMA transformer架構的自迴歸語言模型，專注於對話任務的文本生成。

模型特點

強化學習微調

通過人類反饋強化學習（RLHF）和近端策略優化（PPO）在多種對話和指令數據集上進行微調。

多數據集訓練

在OpenAssistant、GPT4All和Alpaca等多個高質量數據集上進行訓練。

對話優化

專注於對話任務的文本生成，能夠生成連貫、有意義的對話響應。

模型能力

文本生成

對話系統

指令跟隨

使用案例

對話系統

智能助手

用於構建智能助手，能夠理解並回應用戶的指令和問題。

文本生成

代碼生成

根據用戶指令生成Python腳本等代碼片段。

🚀 StableVicuna-13B

StableVicuna-13B是一個基於多種對話和指令數據集，通過近端策略優化（PPO）進行基於人類反饋的強化學習（RLHF）微調的模型。它能有效處理對話任務，為用戶提供高質量的文本生成服務。

🚀 快速開始

應用增量權重

StableVicuna-13B不能僅使用CarperAI/stable-vicuna-13b-delta的權重。要獲得正確的模型，必須將LLaMA 13B和CarperAI/stable-vicuna-13b-delta權重之間的差異加回去。我們提供了apply_delta.py腳本以自動完成轉換，你可以按以下方式運行：

python3 apply_delta.py --base /path/to/model_weights/llama-13b --target stable-vicuna-13b --delta CarperAI/stable-vicuna-13b-delta

開始使用

應用增量權重後，你可以使用transformers庫開始與模型進行對話。根據Vicuna團隊針對Vicuna v0的建議，你應該安裝此版本的transformers：

pip install git+https://github.com/huggingface/transformers@c612628045822f909020f7eb6784c79700813eda

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("path/to/stable-vicuna-13b-applied")
model = AutoModelForCausalLM.from_pretrained("path/to/stable-vicuna-13b-applied")
model.half().cuda()

prompt = """\
### Human: Write a Python script for text classification using Transformers and PyTorch
### Assistant:\
"""

inputs = tokenizer(prompt, return_tensors='pt').to('cuda')
tokens = model.generate(
 **inputs,
 max_new_tokens=256,
 do_sample=True,
 temperature=1.0,
 top_p=1.0,
)
print(tokenizer.decode(tokens[0], skip_special_tokens=True))

✨ 主要特性

基於多種數據集微調：使用多個對話和指令數據集進行微調，提升了模型在對話任務中的表現。
強化學習優化：通過近端策略優化（PPO）進行基於人類反饋的強化學習（RLHF），使模型生成的文本更符合人類偏好。

📦 安裝指南

安裝步驟見上述“快速開始”部分，需先應用增量權重，再安裝指定版本的transformers庫。

💻 使用示例

基礎用法

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("path/to/stable-vicuna-13b-applied")
model = AutoModelForCausalLM.from_pretrained("path/to/stable-vicuna-13b-applied")
model.half().cuda()

prompt = """\
### Human: Write a Python script for text classification using Transformers and PyTorch
### Assistant:\
"""

inputs = tokenizer(prompt, return_tensors='pt').to('cuda')
tokens = model.generate(
 **inputs,
 max_new_tokens=256,
 do_sample=True,
 temperature=1.0,
 top_p=1.0,
)
print(tokenizer.decode(tokens[0], skip_special_tokens=True))

📚 詳細文檔

模型詳情

屬性	詳情
訓練者	Duy Phung of CarperAI
模型類型	StableVicuna-13B是一個基於LLaMA變壓器架構的自迴歸語言模型
語言	英文
庫	trlX
增量權重許可證	CC-BY-NC-SA-4.0
基礎LLaMA模型權重許可證	Meta的非商業定製許可證
聯繫信息	有關模型的問題和評論，請訪問CarperAI和StableFoundation的Discord服務器

超參數	值
\(n_\text{parameters}\)	13B
\(d_\text{model}\)	5120
\(n_\text{layers}\)	40
\(n_\text{heads}\)	40

訓練

訓練數據集

StableVicuna-13B在三個數據集的混合數據上進行微調：

OpenAssistant Conversations Dataset (OASST1)：一個由人類生成、人類標註的助手式對話語料庫，包含161,443條消息，分佈在66,497個對話樹中，使用35種不同語言。
GPT4All Prompt Generations：一個由GPT - 4生成的包含400k提示和響應的數據集。
Alpaca：一個由OpenAI的text - davinci - 003引擎生成的包含52,000條指令和演示的數據集。

在基於人類反饋的強化學習（RLHF）期間使用的獎勵模型也在OpenAssistant Conversations Dataset (OASST1)以及另外兩個數據集上進行訓練：

Anthropic HH - RLHF：一個關於AI助手有用性和無害性偏好的數據集。
Stanford Human Preferences Dataset：一個包含385K條人類對18個不同主題領域（從烹飪到法律建議）的問題/指令響應的集體偏好的數據集。

訓練過程

CarperAI/stable-vicuna-13b-delta使用trlX中實現的PPO進行訓練，配置如下：

超參數	值
num_rollouts	128
chunk_size	16
ppo_epochs	4
init_kl_coef	0.1
target	6
horizon	10000
gamma	1
lam	0.95
cliprange	0.2
cliprange_value	0.2
vf_coef	1.0
scale_reward	None
cliprange_reward	10
generation_kwargs
max_length	512
min_length	48
top_k	0.0
top_p	1.0
do_sample	True
temperature	1.0

使用和限制

預期用途

該模型旨在用於文本生成，尤其專注於對話任務。用戶可以根據非商業許可證在自己的數據上進一步微調模型，以提高模型在特定任務上的性能。

限制和偏差

基礎LLaMA模型在各種數據上進行訓練，其中一些數據可能包含冒犯性、有害和有偏差的內容，這可能導致模型產生不良行為。請參閱LLaMA 論文的第5.1節。我們尚未進行任何研究來確定在上述數據集上進行微調如何影響模型的行為和毒性。請勿將此模型的聊天響應視為人類判斷的替代品或事實來源，請謹慎使用。

🔧 技術細節

StableVicuna-13B基於Vicuna-13B v0模型，通過近端策略優化（PPO）進行基於人類反饋的強化學習（RLHF）微調。在訓練過程中，使用了多個對話和指令數據集，以提升模型在對話任務中的表現。同時，獎勵模型也在多個相關數據集上進行訓練，以確保模型生成的文本更符合人類偏好。

📄 許可證

增量權重許可證為CC-BY-NC-SA-4.0，基礎LLaMA模型權重許可證為Meta的非商業定製許可證。

致謝

如果沒有Stability AI的支持，這項工作將無法完成。

引用

@article{touvron2023llama,
  title={LLaMA: Open and Efficient Foundation Language Models},
  author={Touvron, Hugo and Lavril, Thibaut and Izacard, Gautier and Martinet, Xavier and Lachaux, Marie-Anne and Lacroix, Timoth{\'e}e and Rozi{\`e}re, Baptiste and Goyal, Naman and Hambro, Eric and Azhar, Faisal and Rodriguez, Aurelien and Joulin, Armand and Grave, Edouard and Lample, Guillaume},
  journal={arXiv preprint arXiv:2302.13971},
  year={2023}
}

@misc{vicuna2023,
    title = {Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality},
    url = {https://vicuna.lmsys.org},
    author = {Chiang, Wei-Lin and Li, Zhuohan and Lin, Zi and Sheng, Ying and Wu, Zhanghao and Zhang, Hao and Zheng, Lianmin and Zhuang, Siyuan and Zhuang, Yonghao and Gonzalez, Joseph E. and Stoica, Ion and Xing, Eric P.},
    month = {March},
    year = {2023}
}

@misc{gpt4all,
  author = {Yuvanesh Anand and Zach Nussbaum and Brandon Duderstadt and Benjamin Schmidt and Andriy Mulyar},
  title = {GPT4All: Training an Assistant-style Chatbot with Large Scale Data Distillation from GPT-3.5-Turbo},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/nomic-ai/gpt4all}},
}

@misc{alpaca,
  author = {Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto },
  title = {Stanford Alpaca: An Instruction-following LLaMA model},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/tatsu-lab/stanford_alpaca}},
}

@software{leandro_von_werra_2023_7790115,
  author       = {Leandro von Werra and
                  Alex Havrilla and
                  Max reciprocated and
                  Jonathan Tow and
                  Aman cat-state and
                  Duy V. Phung and
                  Louis Castricato and
                  Shahbuland Matiana and
                  Alan and
                  Ayush Thakur and
                  Alexey Bukhtiyarov and
                  aaronrmm and
                  Fabrizio Milo and
                  Daniel and
                  Daniel King and
                  Dong Shin and
                  Ethan Kim and
                  Justin Wei and
                  Manuel Romero and
                  Nicky Pochinkov and
                  Omar Sanseviero and
                  Reshinth Adithyan and
                  Sherman Siu and
                  Thomas Simonini and
                  Vladimir Blagojevic and
                  Xu Song and
                  Zack Witten and
                  alexandremuzio and
                  crumb},
  title        = {{CarperAI/trlx: v0.6.0: LLaMa (Alpaca), Benchmark 
                   Util, T5 ILQL, Tests}},
  month        = mar,
  year         = 2023,
  publisher    = {Zenodo},
  version      = {v0.6.0},
  doi          = {10.5281/zenodo.7790115},
  url          = {https://doi.org/10.5281/zenodo.7790115}
}