StableVicuna-13Bオープンソース対話モデル - 微調整により最適化され、さまざまな対話シーンをサポート

Home

Stable Vicuna 13b Delta

Developed by CarperAI

StableVicuna-13BはVicuna-13B v0モデルを基に、人間のフィードバックによる強化学習（RLHF）と近接方策最適化（PPO）を用いて、様々な対話および命令データセットで微調整されたモデルです。

大規模言語モデル

Transformers

English#RLHF微調整 #マルチターン対話最適化 #命令追従

Downloads 31

Release Time : 4/26/2023

Model Overview

StableVicuna-13BはLLaMAトランスフォーマーアーキテクチャに基づく自己回帰型言語モデルで、対話タスクに特化したテキスト生成を行います。

Model Features

強化学習による微調整

人間のフィードバックによる強化学習（RLHF）と近接方策最適化（PPO）を用いて、様々な対話および命令データセットで微調整されています。

複数データセットでの学習

OpenAssistant、GPT4All、Alpacaなどの高品質なデータセットで学習されています。

対話最適化

対話タスクに特化したテキスト生成が可能で、一貫性があり意味のある対話応答を生成できます。

Model Capabilities

テキスト生成

対話システム

命令追従

Use Cases

対話システム

インテリジェントアシスタント

ユーザーの命令や質問を理解し、応答するインテリジェントアシスタントの構築に使用できます。

テキスト生成

コード生成

ユーザーの命令に基づいてPythonスクリプトなどのコードスニペットを生成します。

🚀 StableVicuna-13B

StableVicuna-13Bは、様々な会話および指示データセットに対して、Proximal Policy Optimization (PPO) を通じた強化学習による人間のフィードバック (RLHF) を使用して微調整された Vicuna-13B v0 モデルです。このモデルは、会話タスクを中心としたテキスト生成に特化しています。

🚀 クイックスタート

デルタ重みの適用

StableVicuna-13Bは、CarperAI/stable-vicuna-13b-delta の重みだけでは使用できません。正しいモデルを取得するには、LLaMA 13Bと CarperAI/stable-vicuna-13b-delta の重みの差分を戻す必要があります。変換を自動化するための apply_delta.py スクリプトを提供しています。以下のように実行できます。

python3 apply_delta.py --base /path/to/model_weights/llama-13b --target stable-vicuna-13b --delta CarperAI/stable-vicuna-13b-delta

モデルの使用

デルタ重みを適用したら、transformers ライブラリを使用してモデルとのチャットを開始できます。Vicuna Teamの提案に従い、Vicuna v0では次のバージョンのtransformersをインストールする必要があります。

pip install git+https://github.com/huggingface/transformers@c612628045822f909020f7eb6784c79700813eda

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("path/to/stable-vicuna-13b-applied")
model = AutoModelForCausalLM.from_pretrained("path/to/stable-vicuna-13b-applied")
model.half().cuda()

prompt = """\
### Human: Write a Python script for text classification using Transformers and PyTorch
### Assistant:\
"""

inputs = tokenizer(prompt, return_tensors='pt').to('cuda')
tokens = model.generate(
 **inputs,
 max_new_tokens=256,
 do_sample=True,
 temperature=1.0,
 top_p=1.0,
)
print(tokenizer.decode(tokens[0], skip_special_tokens=True))

✨ 主な機能

様々な会話および指示データセットを使用した強化学習による微調整
会話タスクを中心としたテキスト生成に特化

📦 インストール

上記のクイックスタートセクションで説明したように、デルタ重みの適用とtransformersライブラリのインストールが必要です。

💻 使用例

基本的な使用法

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("path/to/stable-vicuna-13b-applied")
model = AutoModelForCausalLM.from_pretrained("path/to/stable-vicuna-13b-applied")
model.half().cuda()

prompt = """\
### Human: Write a Python script for text classification using Transformers and PyTorch
### Assistant:\
"""

inputs = tokenizer(prompt, return_tensors='pt').to('cuda')
tokens = model.generate(
 **inputs,
 max_new_tokens=256,
 do_sample=True,
 temperature=1.0,
 top_p=1.0,
)
print(tokenizer.decode(tokens[0], skip_special_tokens=True))

📚 ドキュメント

モデルの詳細

プロパティ	詳細
モデルタイプ	StableVicuna-13Bは、LLaMAトランスフォーマーアーキテクチャに基づく自己回帰型言語モデルです。
訓練データ	OpenAssistant/oasst1、nomic-ai/gpt4all_prompt_generations、tatsu-lab/alpaca
訓練者	Duy Phung of CarperAI
言語	英語
ライブラリ	trlX
デルタ重みのライセンス	CC-BY-NC-SA-4.0
コンタクト	モデルに関する質問やコメントは、CarperAI および StableFoundation のDiscordサーバーにアクセスしてください。

ハイパーパラメータ	値
\(n_\text{parameters}\)	13B
\(d_\text{model}\)	5120
\(n_\text{layers}\)	40
\(n_\text{heads}\)	40

訓練

訓練データセット

StableVicuna-13Bは、3つのデータセットの混合で微調整されています。

OpenAssistant Conversations Dataset (OASST1)：人間が生成し、人間がアノテーションを付けたアシスタントスタイルの会話コーパスで、35の異なる言語で66,497の会話ツリーに分散された161,443のメッセージから構成されています。
GPT4All Prompt Generations：GPT-4によって生成された40万のプロンプトと応答のデータセット。
Alpaca：OpenAIのtext-davinci-003エンジンによって生成された52,000の指示とデモンストレーションのデータセット。

RLHF中に使用される報酬モデルも、OpenAssistant Conversations Dataset (OASST1) と他の2つのデータセットで訓練されています。

Anthropic HH-RLHF：AIアシスタントの有用性と無害性に関する嗜好のデータセット。
Stanford Human Preferences Dataset：料理から法律相談まで、18の異なる主題領域の質問/指示に対する応答に関する385Kの集団的な人間の嗜好のデータセット。

訓練手順

CarperAI/stable-vicuna-13b-delta は、trlX で実装されたPPOを使用して、以下の設定で訓練されました。

ハイパーパラメータ	値
num_rollouts	128
chunk_size	16
ppo_epochs	4
init_kl_coef	0.1
target	6
horizon	10000
gamma	1
lam	0.95
cliprange	0.2
cliprange_value	0.2
vf_coef	1.0
scale_reward	None
cliprange_reward	10
generation_kwargs
max_length	512
min_length	48
top_k	0.0
top_p	1.0
do_sample	True
temperature	1.0

使用と制限

意図された使用法

このモデルは、会話タスクを中心としたテキスト生成に使用することを意図しています。ユーザーは、非商用のライセンスに従って、独自のデータでモデルをさらに微調整し、特定のタスクでのモデルのパフォーマンスを向上させることができます。

制限とバイアス

ベースのLLaMAモデルは、様々なデータで訓練されており、その一部には不快な、有害な、バイアスのある内容が含まれている可能性があり、有毒な振る舞いにつながることがあります。LLaMA 論文のセクション5.1を参照してください。上記のデータセットでの微調整がモデルの振る舞いと毒性にどのような影響を与えるかを判断するための研究は行っていません。このモデルからのチャット応答を人間の判断の代替品や真実の源として扱わないでください。責任を持って使用してください。

謝辞

この研究は、Stability AI の支援なしには不可能でした。

引用

@article{touvron2023llama,
  title={LLaMA: Open and Efficient Foundation Language Models},
  author={Touvron, Hugo and Lavril, Thibaut and Izacard, Gautier and Martinet, Xavier and Lachaux, Marie-Anne and Lacroix, Timoth{\'e}e and Rozi{\`e}re, Baptiste and Goyal, Naman and Hambro, Eric and Azhar, Faisal and Rodriguez, Aurelien and Joulin, Armand and Grave, Edouard and Lample, Guillaume},
  journal={arXiv preprint arXiv:2302.13971},
  year={2023}
}

@misc{vicuna2023,
    title = {Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality},
    url = {https://vicuna.lmsys.org},
    author = {Chiang, Wei-Lin and Li, Zhuohan and Lin, Zi and Sheng, Ying and Wu, Zhanghao and Zhang, Hao and Zheng, Lianmin and Zhuang, Siyuan and Zhuang, Yonghao and Gonzalez, Joseph E. and Stoica, Ion and Xing, Eric P.},
    month = {March},
    year = {2023}
}

@misc{gpt4all,
  author = {Yuvanesh Anand and Zach Nussbaum and Brandon Duderstadt and Benjamin Schmidt and Andriy Mulyar},
  title = {GPT4All: Training an Assistant-style Chatbot with Large Scale Data Distillation from GPT-3.5-Turbo},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/nomic-ai/gpt4all}},
}

@misc{alpaca,
  author = {Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto },
  title = {Stanford Alpaca: An Instruction-following LLaMA model},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/tatsu-lab/stanford_alpaca}},
}

@software{leandro_von_werra_2023_7790115,
  author       = {Leandro von Werra and
                  Alex Havrilla and
                  Max reciprocated and
                  Jonathan Tow and
                  Aman cat-state and
                  Duy V. Phung and
                  Louis Castricato and
                  Shahbuland Matiana and
                  Alan and
                  Ayush Thakur and
                  Alexey Bukhtiyarov and
                  aaronrmm and
                  Fabrizio Milo and
                  Daniel and
                  Daniel King and
                  Dong Shin and
                  Ethan Kim and
                  Justin Wei and
                  Manuel Romero and
                  Nicky Pochinkov and
                  Omar Sanseviero and
                  Reshinth Adithyan and
                  Sherman Siu and
                  Thomas Simonini and
                  Vladimir Blagojevic and
                  Xu Song and
                  Zack Witten and
                  alexandremuzio and
                  crumb},
  title        = {{CarperAI/trlx: v0.6.0: LLaMa (Alpaca), Benchmark 
                   Util, T5 ILQL, Tests}},
  month        = mar,
  year         = 2023,
  publisher    = {Zenodo},
  version      = {v0.6.0},
  doi          = {10.5281/zenodo.7790115},
  url          = {https://doi.org/10.5281/zenodo.7790115}
}