EVA Qwen2.5 14B V0.2

EVA-UNIT-01によって開発

ロールプレイ/ストーリーライティングに特化した専門モデルで、Qwen2.5-14Bをベースに全パラメータ微調整を行い、合成データと自然データを融合しています。

大規模言語モデル

Transformers

オープンソースライセンス:Apache-2.0 #ロールプレイ専門家 #長文ストーリー生成 #クリエイティブライティング強化

ダウンロード数 287

リリース時間 : 11/6/2024

モデル概要

このモデルはロールプレイとストーリーライティングタスクに特化しており、全パラメータ微調整により一貫性、指示の遵守、長文脈理解能力を最適化しています。ChatMLプロンプトフォーマットを採用し、クリエイティブライティングやインタラクティブなロールプレイシナリオに適しています。

モデル特徴

最適化されたロールプレイ能力

ロールプレイシナリオに特化して微調整されており、キャラクター設定に沿った一貫性のある対話を生成可能

長文脈理解

10240トークンの長文脈ウィンドウをサポートし、複雑なストーリー展開に適している

クリエイティブライティング強化

様々なライティングデータセットを融合し、ストーリー創作の多様性と創造性を向上

最適化された指示遵守

v0.1バージョンと比較して指示理解と実行能力が大幅に向上

モデル能力

ロールプレイ対話生成

ストーリー創作

クリエイティブライティング

長文生成

指示フォロー

使用事例

クリエイティブライティング

小説創作

一貫性のあるストーリー展開とキャラクター対話を生成

キャラクターの性格とストーリー展開が一貫した長文を生成可能

脚本執筆

プロンプトに基づいて脚本の一部を生成

シーンとキャラクターの一貫性を維持可能

インタラクティブエンターテインメント

ロールプレイゲーム

ゲームNPCとして動的な対話を提供

キャラクター設定に基づいた性格に合った返答を生成可能

バーチャルキャラクターチャット

ユーザーとロールプレイ対話を実施

キャラクターの一貫性を保ちつつストーリーを展開可能

library_name: transformers license: apache-2.0 base_model: Qwen/Qwen2.5-14B datasets:

anthracite-org/kalo-opus-instruct-22k-no-refusal
Nopm/Opus_WritingStruct
Gryphe/Sonnet3.5-SlimOrcaDedupCleaned
Gryphe/Sonnet3.5-Charcard-Roleplay
Gryphe/ChatGPT-4o-Writing-Prompts
Epiculous/Synthstruct-Gens-v1.1-Filtered-n-Cleaned
Epiculous/SynthRP-Gens-v1.1-Filtered-n-Cleaned
nothingiisreal/Reddit-Dirty-And-WritingPrompts
allura-org/Celeste-1.x-data-mixture
cognitivecomputations/dolphin-2.9.3 tags:
generated_from_trainer model-index:
name: EVA-Qwen2.5-14B-SFFT-v0.2 results: []

EVA Qwen2.5-14B v0.2

A RP/storywriting specialist model, full-parameter finetune of Qwen2.5-14B on mixture of synthetic and natural data.
It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and "flavor" of the resulting model.

Version notes for 0.2: Now using the refined dataset from 32B 0.2. Major improvements in coherence, instruction following and long-context comprehension over 14B v0.1.

Prompt format is ChatML.

Recommended sampler values:

Temperature: 0.8
Min-P: 0.05
Top-A: 0.3
Repetition Penalty: 1.03

Recommended SillyTavern presets (via CalamitousFelicitousness):

Training data:

Celeste 70B 0.1 data mixture minus Opus Instruct subset. See that model's card for details.
Kalomaze's Opus_Instruct_25k dataset, filtered for refusals.
A subset (1k rows) of ChatGPT-4o-WritingPrompts by Gryphe
A subset (2k rows) of Sonnet3.5-Charcards-Roleplay by Gryphe
Synthstruct and SynthRP datasets by Epiculous
A subset from Dolphin-2.9.3, including filtered version of not_samantha and a small subset of systemchat.

Training time and hardware:

3 hours on 8xH100 SXM, provided by FeatherlessAI

Model was created by Kearm, Auri and Cahvay.

Special thanks:

to Cahvay for his work on investigating and reprocessing the corrupted dataset, removing the single biggest source of data poisoning.
to FeatherlessAI for generously providing 8xH100 SXM node for training of this model
to Gryphe, Lemmy, Kalomaze, Nopm, Epiculous and CognitiveComputations for the data
and to Allura-org for support, feedback, beta-testing and doing quality control of EVA models.

See axolotl config

axolotl version: 0.4.1

base_model: Qwen/Qwen2.5-14B

load_in_8bit: false
load_in_4bit: false
strict: false

plugins:
  - axolotl.integrations.liger.LigerPlugin
liger_rope: true
liger_rms_norm: true
liger_swiglu: true
liger_fused_linear_cross_entropy: true

# plugins:
#   - axolotl.integrations.spectrum.SpectrumPlugin

# spectrum_top_fraction: 0.5
# # Optional if using a pre-scanned model as your base_model. Useful if using a model mirror
# spectrum_model_name: Qwen/Qwen2.5-32B

datasets:
  - path: datasets/Celeste_Filtered_utf8fix.jsonl
    type: sharegpt
  - path: datasets/deduped_not_samantha_norefusals.jsonl
    type: sharegpt
  - path: datasets/deduped_SynthRP-Gens_processed_ShareGPT_converted_cleaned.jsonl
    type: sharegpt
  - path: datasets/deduped_Synthstruct-Gens_processed_sharegpt_converted_cleaned.jsonl
    type: sharegpt
  - path: datasets/Gryphe-4o-WP-filtered-sharegpt_utf8fix.jsonl
    type: sharegpt
  - path: datasets/opus-instruct-22k-no_refusals-filtered_utf8fix.jsonl
    type: sharegpt
  - path: datasets/Sonnet3-5-charcard-names-filtered-sharegpt_utf8fix.jsonl
    type: sharegpt
  - path: datasets/SystemChat_subset_filtered_sharegpt_utf8fix.jsonl
    type: sharegpt

chat_template: chatml
shuffle_merged_datasets: true
val_set_size: 0.001
output_dir: ./EVA-Qwen2.5-14B-SFFT-v0.2

sequence_len: 10240
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true

# adapter: qlora
# lora_model_dir:
# lora_r: 64
# lora_alpha: 128
# lora_dropout: 0.05
# lora_target_linear: true
# peft_use_dora: true

base_model: Qwen/Qwen2.5-14B

load_in_8bit: false
load_in_4bit: false
strict: false

plugins:
  - axolotl.integrations.liger.LigerPlugin
liger_rope: true
liger_rms_norm: true
liger_swiglu: true
liger_fused_linear_cross_entropy: true

datasets:
  - path: datasets/Celeste_Filtered_utf8fix.jsonl
    type: sharegpt
  - path: datasets/deduped_not_samantha_norefusals.jsonl
    type: sharegpt
  - path: datasets/deduped_SynthRP-Gens_processed_ShareGPT_converted_cleaned.jsonl
    type: sharegpt
  - path: datasets/deduped_Synthstruct-Gens_processed_sharegpt_converted_cleaned.jsonl
    type: sharegpt
  - path: datasets/Gryphe-4o-WP-filtered-sharegpt_utf8fix.jsonl
    type: sharegpt
  - path: datasets/opus-instruct-22k-no_refusals-filtered_utf8fix.jsonl
    type: sharegpt
  - path: datasets/Sonnet3-5-charcard-names-filtered-sharegpt_utf8fix.jsonl
    type: sharegpt
  - path: datasets/SystemChat_subset_filtered_sharegpt_utf8fix.jsonl
    type: sharegpt

chat_template: chatml
shuffle_merged_datasets: true
val_set_size: 0.005
output_dir: ./EVA-Qwen2.5-14B-SFFT-v0.2

sequence_len: 10240
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true

# adapter: qlora
# lora_model_dir:
# lora_r: 32
# lora_alpha: 16
# lora_dropout: 0.05
# lora_target_linear: true
# peft_use_dora: true

unfrozen_parameters:
- ^lm_head.weight$
- ^model.embed_tokens.weight$
# mlp.down_proj layers
- model.layers.1.mlp.down_proj
- model.layers.35.mlp.down_proj
- model.layers.38.mlp.down_proj
- model.layers.37.mlp.down_proj
- model.layers.36.mlp.down_proj
- model.layers.15.mlp.down_proj
- model.layers.11.mlp.down_proj
- model.layers.12.mlp.down_proj
- model.layers.34.mlp.down_proj
- model.layers.44.mlp.down_proj
- model.layers.45.mlp.down_proj
- model.layers.9.mlp.down_proj
- model.layers.41.mlp.down_proj
- model.layers.33.mlp.down_proj
- model.layers.43.mlp.down_proj
- model.layers.40.mlp.down_proj
- model.layers.13.mlp.down_proj
- model.layers.8.mlp.down_proj
- model.layers.39.mlp.down_proj
- model.layers.10.mlp.down_proj
- model.layers.14.mlp.down_proj
- model.layers.16.mlp.down_proj
- model.layers.31.mlp.down_proj
- model.layers.32.mlp.down_proj
# mlp.gate_proj layers
- model.layers.1.mlp.gate_proj
- model.layers.44.mlp.gate_proj
- model.layers.46.mlp.gate_proj
- model.layers.45.mlp.gate_proj
- model.layers.43.mlp.gate_proj
- model.layers.47.mlp.gate_proj
- model.layers.42.mlp.gate_proj
- model.layers.32.mlp.gate_proj
- model.layers.27.mlp.gate_proj
- model.layers.33.mlp.gate_proj
- model.layers.28.mlp.gate_proj
- model.layers.39.mlp.gate_proj
- model.layers.41.mlp.gate_proj
- model.layers.40.mlp.gate_proj
- model.layers.30.mlp.gate_proj
- model.layers.29.mlp.gate_proj
- model.layers.31.mlp.gate_proj
- model.layers.37.mlp.gate_proj
- model.layers.26.mlp.gate_proj
- model.layers.10.mlp.gate_proj
- model.layers.38.mlp.gate_proj
- model.layers.36.mlp.gate_proj
- model.layers.12.mlp.gate_proj
- model.layers.13.mlp.gate_proj
# mlp.up_proj layers
- model.layers.1.mlp.up_proj
- model.layers.13.mlp.up_proj
- model.layers.11.mlp.up_proj
- model.layers.14.mlp.up_proj
- model.layers.15.mlp.up_proj
- model.layers.12.mlp.up_proj
- model.layers.8.mlp.up_proj
- model.layers.16.mlp.up_proj
- model.layers.9.mlp.up_proj
- model.layers.19.mlp.up_proj
- model.layers.10.mlp.up_proj
- model.layers.7.mlp.up_proj
- model.layers.17.mlp.up_proj
- model.layers.20.mlp.up_proj
- model.layers.21.mlp.up_proj
- model.layers.18.mlp.up_proj
- model.layers.37.mlp.up_proj
- model.layers.38.mlp.up_proj
- model.layers.39.mlp.up_proj
- model.layers.42.mlp.up_proj
- model.layers.41.mlp.up_proj
- model.layers.27.mlp.up_proj
- model.layers.28.mlp.up_proj
- model.layers.36.mlp.up_proj
# self_attn.k_proj layers
- model.layers.47.self_attn.k_proj
- model.layers.39.self_attn.k_proj
- model.layers.41.self_attn.k_proj
- model.layers.37.self_attn.k_proj
- model.layers.35.self_attn.k_proj
- model.layers.44.self_attn.k_proj
- model.layers.38.self_attn.k_proj
- model.layers.14.self_attn.k_proj
- model.layers.7.self_attn.k_proj
- model.layers.12.self_attn.k_proj
- model.layers.11.self_attn.k_proj
- model.layers.32.self_attn.k_proj
- model.layers.10.self_attn.k_proj
- model.layers.8.self_attn.k_proj
- model.layers.6.self_attn.k_proj
- model.layers.9.self_attn.k_proj
- model.layers.45.self_attn.k_proj
- model.layers.42.self_attn.k_proj
- model.layers.40.self_attn.k_proj
- model.layers.5.self_attn.k_proj
- model.layers.0.self_attn.k_proj
- model.layers.33.self_attn.k_proj
- model.layers.34.self_attn.k_proj
- model.layers.13.self_attn.k_proj
# self_attn.o_proj layers
- model.layers.12.self_attn.o_proj
- model.layers.5.self_attn.o_proj
- model.layers.14.self_attn.o_proj
- model.layers.16.self_attn.o_proj
- model.layers.20.self_attn.o_proj
- model.layers.13.self_attn.o_proj
- model.layers.11.self_attn.o_proj
- model.layers.4.self_attn.o_proj
- model.layers.6.self_attn.o_proj
- model.layers.19.self_attn.o_proj
- model.layers.7.self_attn.o_proj
- model.layers.18.self_attn.o_proj
- model.layers.8.self_attn.o_proj
- model.layers.38.self_attn.o_proj
- model.layers.15.self_attn.o_proj
- model.layers.17.self_attn.o_proj
- model.layers.9.self_attn.o_proj
- model.layers.10.self_attn.o_proj
- model.layers.21.self_attn.o_proj
- model.layers.28.self_attn.o_proj
- model.layers.32.self_attn.o_proj
- model.layers.35.self_attn.o_proj
- model.layers.39.self_attn.o_proj
- model.layers.3.self_attn.o_proj
# self_attn.q_proj layers
- model.layers.1.self_attn.q_proj
- model.layers.2.self_attn.q_proj
- model.layers.3.self_attn.q_proj
- model.layers.44.self_attn.q_proj
- model.layers.29.self_attn.q_proj
- model.layers.45.self_attn.q_proj
- model.layers.43.self_attn.q_proj
- model.layers.32.self_attn.q_proj
- model.layers.38.self_attn.q_proj
- model.layers.19.self_attn.q_proj
- model.layers.42.self_attn.q_proj
- model.layers.34.self_attn.q_proj
- model.layers.36.self_attn.q_proj
- model.layers.40.self_attn.q_proj
- model.layers.26.self_attn.q_proj
- model.layers.20.self_attn.q_proj
- model.layers.28.self_attn.q_proj
- model.layers.39.self_attn.q_proj
- model.layers.41.self_attn.q_proj
- model.layers.33.self_attn.q_proj
- model.layers.35.self_attn.q_proj
- model.layers.25.self_attn.q_proj
- model.layers.30.self_attn.q_proj
- model.layers.27.self_attn.q_proj
# self_attn.v_proj layers
- model.layers.0.self_attn.v_proj
- model.layers.7.self_attn.v_proj
- model.layers.39.self_attn.v_proj
- model.layers.31.self_attn.v_proj
- model.layers.15.self_attn.v_proj
- model.layers.10.self_attn.v_proj
- model.layers.41.self_attn.v_proj
- model.layers.32.self_attn.v_proj
- model.layers.6.self_attn.v_proj
- model.layers.33.self_attn.v_proj
- model.layers.42.self_attn.v_proj
- model.layers.29.self_attn.v_proj
- model.layers.9.self_attn.v_proj
- model.layers.14.self_attn.v_proj
- model.layers.35.self_attn.v_proj
- model.layers.38.self_attn.v_proj
- model.layers.13.self_attn.v_proj
- model.layers.30.self_attn.v_proj
- model.layers.34.self_attn.v_proj
- model.layers.5.self_attn.v_proj
- model.layers.28.self_attn.v_proj
- model.layers.37.self_attn.v_proj
- model.layers.27.self_attn.v_proj
- model.layers.11.self_attn.v_proj

wandb_project: EVA-Qwen2.5-14B-SFFT-v0.2
wandb_entity:
wandb_watch:
wandb_name: Unit-02
wandb_log_model:

gradient_accumulation_steps: 8
micro_batch_size: 2
num_epochs: 3
optimizer: paged_ademamix_8bit
lr_scheduler: cosine
learning_rate: 0.00005
max_grad_norm: 3

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: "unsloth"
# gradient_checkpointing_kwargs:
#   use_reentrant: true
early_stopping_patience:
resume_from_checkpoint: 
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 20
evals_per_epoch: 4
saves_per_epoch: 4
save_safetensors: true
hub_model_id: 
hub_strategy: 
debug:
deepspeed: deepspeed_configs/zero3_bf16.json
weight_decay: 0.1
# fsdp:
#   - full_shard
#   - auto_wrap
# fsdp_config:
#   fsdp_limit_all_gathers: true
#   fsdp_sync_module_states: false
#   fsdp_offload_params: true
#   fsdp_cpu_ram_efficient_loading: true
#   fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
#   fsdp_transformer_layer_cls_to_wrap: Qwen2DecoderLayer
#   fsdp_activation_checkpointing: true
#   fsdp_state_dict_type: SHARDED_STATE_DICT  # Changed from FULL_STATE_DICT
#   fsdp_sharding_strategy: FULL_SHARD
#   fsdp_forward_prefetch: false  # Added
#   fsdp_backward_prefetch: "BACKWARD_PRE"  # Added
#   fsdp_backward_prefetch_limit: 1  # Added
#   fsdp_mixed_precision: BF16  # Added