Vapor_v2_7Bオープンソース大規模言語モデル - 13種類の言語処理をサポートする効率的なコミュニケーションツール

ホーム

Vapor V2 7B

FourOhFourによって開発

Qwen/Qwen2.5-7Bモデルを多言語データセットで微調整した大規模言語モデルで、13言語の処理をサポート

大規模言語モデル

Transformers

オープンソースライセンス:Apache-2.0 #多言語対話 #長文処理 #命令微調整

ダウンロード数 60

リリース時間 : 9/20/2024

モデル概要

これはQwen2.5-7Bモデルを基に微調整した多言語大規模言語モデルで、対話生成と命令追従タスクに特化し、様々な専門分野のデータセットで訓練されています

モデル特徴

多言語サポート

主要なアジア言語とヨーロッパ言語を含む13言語のテキスト生成と理解をサポート

長文脈処理

8192トークンまでの長文脈処理能力をサポート

多分野知識

医学、軍事、推論など様々な専門分野のデータセットで訓練

効率的な訓練

flash attentionや勾配チェックポイントなどの技術で訓練効率を最適化

モデル能力

多言語テキスト生成

命令追従

対話システム

知識質問応答

専門分野相談

使用事例

インテリジェントアシスタント

多言語カスタマーサービスボット

多国籍企業向けに多言語カスタマーサポートを提供

教育

言語学習アシスタント

学習者が複数言語のライティングと会話を練習するのを支援

専門相談

医療情報相談

基礎的な医学知識と健康アドバイスを提供

軍事サバイバルガイド

軍事と野外生存に関する専門知識を提供

🚀 outputs/out

このモデルは、Qwen/Qwen2.5 - 7B をNoneデータセットでファインチューニングしたバージョンです。評価セットでは以下の結果を達成しています。

損失率: 0.7923

axolotl設定を表示

axolotlバージョン: 0.4.1

base_model: Qwen/Qwen2.5-7B
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer

load_in_8bit: false
load_in_4bit: false
strict: false

datasets:
  - path: PocketDoc/Dans-MemoryCore-CoreCurriculum-Small
    type: sharegpt
    conversation: chatml
  - path: NewEden/Kalo-Opus-Instruct-22k-Refusal-Murdered
    type: sharegpt
    conversation: chatml
  - path: Epiculous/Synthstruct-Gens-v1.1-Filtered-n-Cleaned
    type: sharegpt
    conversation: chatml
  - path: NewEden/Gryphe-Sonnet-3.5-35k-Subset
    type: sharegpt
    conversation: chatml
  - path: Nitral-AI/Reasoning-1shot_ShareGPT
    type: sharegpt
    conversation: chatml
  - path: Nitral-AI/GU_Instruct-ShareGPT
    type: sharegpt
    conversation: chatml
  - path: Nitral-AI/Medical_Instruct-ShareGPT
    type: sharegpt
    conversation: chatml
  - path: AquaV/Resistance-Sharegpt
    type: sharegpt
    conversation: chatml
  - path: AquaV/US-Army-Survival-Sharegpt
    type: sharegpt
    conversation: chatml
  - path: Gryphe/Sonnet3.5-SlimOrcaDedupCleaned
    type: sharegpt
    conversation: chatml

chat_template: chatml

val_set_size: 0.002
output_dir: ./outputs/out

adapter:
lora_r:
lora_alpha:
lora_dropout:
lora_target_linear:

sequence_len: 8192
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true

plugins:
  - axolotl.integrations.liger.LigerPlugin
liger_rope: true
liger_rms_norm: true
liger_swiglu: true
liger_fused_linear_cross_entropy: true

wandb_project: qwen7B
wandb_entity:
wandb_watch:
wandb_name: qwen7B
wandb_log_model:

gradient_accumulation_steps: 32
micro_batch_size: 1
num_epochs: 2
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.00001
weight_decay: 0.05

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: true

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_ratio: 0.1
evals_per_epoch: 4
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 2

debug:
deepspeed: 
fsdp:
fsdp_config:

special_tokens:
  pad_token: <pad>

📚 ドキュメント

モデルの説明

詳細情報は後日提供予定です。

想定用途と制限事項

詳細情報は後日提供予定です。

学習と評価データ

詳細情報は後日提供予定です。

学習手順

学習ハイパーパラメータ

学習中に使用されたハイパーパラメータは以下の通りです。

学習率: 1e - 05
学習バッチサイズ: 1
評価バッチサイズ: 1
シード: 42
分散タイプ: マルチGPU
デバイス数: 4
勾配累積ステップ数: 32
総学習バッチサイズ: 128
総評価バッチサイズ: 4
オプティマイザ: Adam (betas=(0.9, 0.999), epsilon=1e - 08)
学習率スケジューラタイプ: cosine
学習率スケジューラウォームアップステップ数: 46
エポック数: 2

学習結果

学習損失率	エポック	ステップ	検証損失率
1.0297	0.0043	1	1.1468
0.8512	0.2515	58	0.8729
0.8496	0.5030	116	0.8193
0.8175	0.7546	174	0.8033
0.7868	1.0041	232	0.7961
0.8119	1.2555	290	0.7934
0.799	1.5069	348	0.7926
0.7891	1.7583	406	0.7923