EVA Qwen2.5 32B V0.2

EVA-UNIT-01によって開発

ロールプレイング/ストーリーライティングに特化した専門モデルで、Qwen2.5-32Bをベースにフルパラメータ微調整を行い、合成データと自然データを融合しています。

大規模言語モデル

Transformers

オープンソースライセンス:Apache-2.0 #ロールプレイング専門家 #クリエイティブライティング強化 #マルチソースデータ統合

ダウンロード数 625

リリース時間 : 11/5/2024

モデル概要

Qwen2.5-32B大規模言語モデルをベースにフルパラメータ微調整を実施し、ロールプレイングやストーリーライティングタスクに特化。複数の高品質データセットを統合することで創造性と表現力を向上させています。

モデル特徴

高品質データ統合

Celeste 70Bデータ混合スキーム及び複数の高品質ライティングデータセットを統合し、モデルのパフォーマンスを大幅に向上

ロールプレイング最適化

ロールプレイングシナリオに特化して最適化され、複雑なキャラクターインタラクションや状況構築をサポート

創作スタイルの多様性

様々なライティングスタイルのデータセットを融合することで、多様な創作ニーズに対応可能

データ汚染修正

v0.2バージョンでは以前のバージョンのデータ汚染問題を修正し、生成品質がより安定

モデル能力

ロールプレイング対話生成

クリエイティブストーリーライティング

ライティングプロンプト応答

マルチターン対話維持

スタイリッシュなテキスト生成

使用事例

クリエイティブライティング

ストーリー創作支援

ユーザーが提供したプロンプトに基づき、一貫性のあるストーリーパラグラフを生成

プロット展開とキャラクター造形を備えた完全なストーリーを生成可能

ライティングインスピレーション喚起

シンプルなプロンプトから多様なクリエイティブライティングの方向性を展開

多様なライティングアイデアとプロット展開の可能性を提供

インタラクティブエンターテインメント

ロールプレイングゲーム

ゲーム内のNPCとしてインテリジェントな対話インタラクションを実現

キャラクターの一貫性を維持しつつ深みのある対話が可能

バーチャルキャラクター作成

キャラクターカードに基づき設定に沿った対話と行動を生成

キャラクター特性を正確に把握し、設定に合致したレスポンスを生成

library_name: transformers license: apache-2.0 datasets:

anthracite-org/kalo-opus-instruct-22k-no-refusal
Nopm/Opus_WritingStruct
Gryphe/Sonnet3.5-SlimOrcaDedupCleaned
Gryphe/Sonnet3.5-Charcard-Roleplay
Gryphe/ChatGPT-4o-Writing-Prompts
Epiculous/Synthstruct-Gens-v1.1-Filtered-n-Cleaned
Epiculous/SynthRP-Gens-v1.1-Filtered-n-Cleaned
nothingiisreal/Reddit-Dirty-And-WritingPrompts
allura-org/Celeste-1.x-data-mixture
cognitivecomputations/dolphin-2.9.3 base_model: Qwen/Qwen2.5-32B tags:
generated_from_trainer model-index:
name: EVA-Qwen2.5-32B-SFFT-v0.1 results: []

EVA Qwen2.5-32B v0.2

A RP/storywriting specialist model, full-parameter finetune of Qwen2.5-32B on mixture of synthetic and natural data.
It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and "flavor" of the resulting model.

Dedicated to Nev.

Version notes for 0.2: Basically, reprocessed the whole dataset again, due to a severe mistake in previously used pipeline, which left the data poisoned with a lot of non-unicode characters. Now, no more weird generation artifacts, and more stability. Major kudos to Cahvay for his work on fixing this critical issue.

Prompt format is ChatML.

Recommended sampler values:

Temperature: 1
Min-P: 0.05
Top-A: 0.2
Repetition Penalty: 1.03

Recommended SillyTavern presets (via CalamitousFelicitousness):

Training data:

Celeste 70B 0.1 data mixture minus Opus Instruct subset. See that model's card for details.
Kalomaze's Opus_Instruct_25k dataset, filtered for refusals.
A subset (1k rows) of ChatGPT-4o-WritingPrompts by Gryphe
A subset (2k rows) of Sonnet3.5-Charcards-Roleplay by Gryphe
Synthstruct and SynthRP datasets by Epiculous
A subset from Dolphin-2.9.3, including filtered version of not_samantha and a small subset of systemchat.

Training time and hardware:

7 hours on 8xH100 SXM, provided by FeatherlessAI

Model was created by Kearm, Auri and Cahvay.

Special thanks:

to Cahvay for his work on investigating and reprocessing the corrupted dataset, removing the single biggest source of data poisoning.
to FeatherlessAI for generously providing 8xH100 SXM node for training of this model
to Gryphe, Lemmy, Kalomaze, Nopm, Epiculous and CognitiveComputations for the data
and to Allura-org for support, feedback, beta-testing and doing quality control of EVA models.

See axolotl config

axolotl version: 0.4.1

base_model: Qwen/Qwen2.5-32B

load_in_8bit: false
load_in_4bit: false
strict: false

plugins:
  - axolotl.integrations.liger.LigerPlugin
liger_rope: true
liger_rms_norm: true
liger_swiglu: true
liger_fused_linear_cross_entropy: true

# plugins:
#   - axolotl.integrations.spectrum.SpectrumPlugin

# spectrum_top_fraction: 0.5
# # Optional if using a pre-scanned model as your base_model. Useful if using a model mirror
# spectrum_model_name: Qwen/Qwen2.5-32B

datasets:
  - path: datasets/Celeste_Filtered_utf8fix.jsonl
    type: sharegpt
  - path: datasets/deduped_not_samantha_norefusals.jsonl
    type: sharegpt
  - path: datasets/deduped_SynthRP-Gens_processed_ShareGPT_converted_cleaned.jsonl
    type: sharegpt
  - path: datasets/deduped_Synthstruct-Gens_processed_sharegpt_converted_cleaned.jsonl
    type: sharegpt
  - path: datasets/Gryphe-4o-WP-filtered-sharegpt_utf8fix.jsonl
    type: sharegpt
  - path: datasets/opus-instruct-22k-no_refusals-filtered_utf8fix.jsonl
    type: sharegpt
  - path: datasets/Sonnet3-5-charcard-names-filtered-sharegpt_utf8fix.jsonl
    type: sharegpt
  - path: datasets/SystemChat_subset_filtered_sharegpt_utf8fix.jsonl
    type: sharegpt

chat_template: chatml
shuffle_merged_datasets: true
val_set_size: 0.001
output_dir: ./EVA-Qwen2.5-32B-SFFT-v0.1

sequence_len: 10240
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true

# adapter: qlora
# lora_model_dir:
# lora_r: 64
# lora_alpha: 128
# lora_dropout: 0.05
# lora_target_linear: true
# peft_use_dora: true

unfrozen_parameters:
- ^lm_head.weight$
- ^model.embed_tokens.weight$
# mlp.down_proj layers
- model.layers.63.mlp.down_proj
- model.layers.49.mlp.down_proj
- model.layers.48.mlp.down_proj
- model.layers.45.mlp.down_proj
- model.layers.44.mlp.down_proj
- model.layers.47.mlp.down_proj
- model.layers.46.mlp.down_proj
- model.layers.43.mlp.down_proj
- model.layers.8.mlp.down_proj
- model.layers.11.mlp.down_proj
- model.layers.19.mlp.down_proj
- model.layers.35.mlp.down_proj
- model.layers.20.mlp.down_proj
- model.layers.52.mlp.down_proj
- model.layers.39.mlp.down_proj
- model.layers.62.mlp.down_proj
- model.layers.50.mlp.down_proj
- model.layers.29.mlp.down_proj
- model.layers.16.mlp.down_proj
- model.layers.28.mlp.down_proj
- model.layers.53.mlp.down_proj
- model.layers.30.mlp.down_proj
- model.layers.31.mlp.down_proj
- model.layers.32.mlp.down_proj
- model.layers.7.mlp.down_proj
- model.layers.36.mlp.down_proj
- model.layers.12.mlp.down_proj
- model.layers.18.mlp.down_proj
- model.layers.37.mlp.down_proj
- model.layers.38.mlp.down_proj
- model.layers.14.mlp.down_proj
- model.layers.13.mlp.down_proj
# mlp.gate_proj layers
- model.layers.43.mlp.gate_proj
- model.layers.61.mlp.gate_proj
- model.layers.60.mlp.gate_proj
- model.layers.44.mlp.gate_proj
- model.layers.62.mlp.gate_proj
- model.layers.28.mlp.gate_proj
- model.layers.29.mlp.gate_proj
- model.layers.45.mlp.gate_proj
- model.layers.37.mlp.gate_proj
- model.layers.35.mlp.gate_proj
- model.layers.59.mlp.gate_proj
- model.layers.36.mlp.gate_proj
- model.layers.30.mlp.gate_proj
- model.layers.48.mlp.gate_proj
- model.layers.38.mlp.gate_proj
- model.layers.27.mlp.gate_proj
- model.layers.31.mlp.gate_proj
- model.layers.34.mlp.gate_proj
- model.layers.58.mlp.gate_proj
- model.layers.33.mlp.gate_proj
- model.layers.39.mlp.gate_proj
- model.layers.26.mlp.gate_proj
- model.layers.32.mlp.gate_proj
- model.layers.46.mlp.gate_proj
- model.layers.42.mlp.gate_proj
- model.layers.49.mlp.gate_proj
- model.layers.57.mlp.gate_proj
- model.layers.50.mlp.gate_proj
- model.layers.47.mlp.gate_proj
- model.layers.56.mlp.gate_proj
- model.layers.63.mlp.gate_proj
- model.layers.55.mlp.gate_proj
# mlp.up_proj layers
- model.layers.61.mlp.up_proj
- model.layers.60.mlp.up_proj
- model.layers.32.mlp.up_proj
- model.layers.59.mlp.up_proj
- model.layers.58.mlp.up_proj
- model.layers.57.mlp.up_proj
- model.layers.44.mlp.up_proj
- model.layers.28.mlp.up_proj
- model.layers.35.mlp.up_proj
- model.layers.36.mlp.up_proj
- model.layers.29.mlp.up_proj
- model.layers.31.mlp.up_proj
- model.layers.34.mlp.up_proj
- model.layers.55.mlp.up_proj
- model.layers.49.mlp.up_proj
- model.layers.30.mlp.up_proj
- model.layers.53.mlp.up_proj
- model.layers.43.mlp.up_proj
- model.layers.56.mlp.up_proj
- model.layers.33.mlp.up_proj
- model.layers.54.mlp.up_proj
- model.layers.62.mlp.up_proj
- model.layers.27.mlp.up_proj
- model.layers.51.mlp.up_proj
- model.layers.52.mlp.up_proj
- model.layers.37.mlp.up_proj
- model.layers.45.mlp.up_proj
- model.layers.26.mlp.up_proj
- model.layers.42.mlp.up_proj
- model.layers.50.mlp.up_proj
- model.layers.48.mlp.up_proj
- model.layers.39.mlp.up_proj
# self_attn.k_proj layers
- model.layers.63.self_attn.k_proj
- model.layers.55.self_attn.k_proj
- model.layers.60.self_attn.k_proj
- model.layers.7.self_attn.k_proj
- model.layers.12.self_attn.k_proj
- model.layers.13.self_attn.k_proj
- model.layers.57.self_attn.k_proj
- model.layers.29.self_attn.k_proj
- model.layers.14.self_attn.k_proj
- model.layers.51.self_attn.k_proj
- model.layers.53.self_attn.k_proj
- model.layers.54.self_attn.k_proj
- model.layers.22.self_attn.k_proj
- model.layers.61.self_attn.k_proj
- model.layers.18.self_attn.k_proj
- model.layers.30.self_attn.k_proj
- model.layers.9.self_attn.k_proj
- model.layers.24.self_attn.k_proj
- model.layers.23.self_attn.k_proj
- model.layers.25.self_attn.k_proj
- model.layers.10.self_attn.k_proj
- model.layers.58.self_attn.k_proj
- model.layers.56.self_attn.k_proj
- model.layers.15.self_attn.k_proj
- model.layers.32.self_attn.k_proj
- model.layers.28.self_attn.k_proj
- model.layers.8.self_attn.k_proj
- model.layers.59.self_attn.k_proj
- model.layers.11.self_attn.k_proj
- model.layers.48.self_attn.k_proj
- model.layers.16.self_attn.k_proj
- model.layers.50.self_attn.k_proj
# self_attn.o_proj layers
- model.layers.15.self_attn.o_proj
- model.layers.23.self_attn.o_proj
- model.layers.31.self_attn.o_proj
- model.layers.30.self_attn.o_proj
- model.layers.18.self_attn.o_proj
- model.layers.24.self_attn.o_proj
- model.layers.17.self_attn.o_proj
- model.layers.28.self_attn.o_proj
- model.layers.34.self_attn.o_proj
- model.layers.33.self_attn.o_proj
- model.layers.25.self_attn.o_proj
- model.layers.12.self_attn.o_proj
- model.layers.14.self_attn.o_proj
- model.layers.29.self_attn.o_proj
- model.layers.16.self_attn.o_proj
- model.layers.26.self_attn.o_proj
- model.layers.22.self_attn.o_proj
- model.layers.27.self_attn.o_proj
- model.layers.35.self_attn.o_proj
- model.layers.20.self_attn.o_proj
- model.layers.13.self_attn.o_proj
- model.layers.36.self_attn.o_proj
- model.layers.19.self_attn.o_proj
- model.layers.37.self_attn.o_proj
- model.layers.21.self_attn.o_proj
- model.layers.11.self_attn.o_proj
- model.layers.54.self_attn.o_proj
- model.layers.5.self_attn.o_proj
- model.layers.38.self_attn.o_proj
- model.layers.6.self_attn.o_proj
- model.layers.8.self_attn.o_proj
- model.layers.9.self_attn.o_proj
# self_attn.q_proj layers
- model.layers.1.self_attn.q_proj
- model.layers.2.self_attn.q_proj
- model.layers.3.self_attn.q_proj
- model.layers.45.self_attn.q_proj
- model.layers.54.self_attn.q_proj
- model.layers.35.self_attn.q_proj
- model.layers.48.self_attn.q_proj
- model.layers.61.self_attn.q_proj
- model.layers.52.self_attn.q_proj
- model.layers.50.self_attn.q_proj
- model.layers.60.self_attn.q_proj
- model.layers.56.self_attn.q_proj
- model.layers.58.self_attn.q_proj
- model.layers.42.self_attn.q_proj
- model.layers.59.self_attn.q_proj
- model.layers.44.self_attn.q_proj
- model.layers.55.self_attn.q_proj
- model.layers.57.self_attn.q_proj
- model.layers.41.self_attn.q_proj
- model.layers.36.self_attn.q_proj
- model.layers.39.self_attn.q_proj
- model.layers.4.self_attn.q_proj
- model.layers.43.self_attn.q_proj
- model.layers.34.self_attn.q_proj
- model.layers.46.self_attn.q_proj
- model.layers.49.self_attn.q_proj
- model.layers.40.self_attn.q_proj
- model.layers.25.self_attn.q_proj
- model.layers.51.self_attn.q_proj
- model.layers.17.self_attn.q_proj
- model.layers.37.self_attn.q_proj
- model.layers.53.self_attn.q_proj
# self_attn.v_proj layers
- model.layers.55.self_attn.v_proj
- model.layers.31.self_attn.v_proj
- model.layers.47.self_attn.v_proj
- model.layers.45.self_attn.v_proj
- model.layers.49.self_attn.v_proj
- model.layers.48.self_attn.v_proj
- model.layers.15.self_attn.v_proj
- model.layers.30.self_attn.v_proj
- model.layers.7.self_attn.v_proj
- model.layers.44.self_attn.v_proj
- model.layers.29.self_attn.v_proj
- model.layers.51.self_attn.v_proj
- model.layers.50.self_attn.v_proj
- model.layers.14.self_attn.v_proj
- model.layers.54.self_attn.v_proj
- model.layers.32.self_attn.v_proj
- model.layers.43.self_attn.v_proj
- model.layers.10.self_attn.v_proj
- model.layers.46.self_attn.v_proj
- model.layers.38.self_attn.v_proj
- model.layers.57.self_attn.v_proj
- model.layers.22.self_attn.v_proj
- model.layers.39.self_attn.v_proj
- model.layers.6.self_attn.v_proj
- model.layers.23.self_attn.v_proj
- model.layers.58.self_attn.v_proj
- model.layers.53.self_attn.v_proj
- model.layers.40.self_attn.v_proj
- model.layers.24.self_attn.v_proj
- model.layers.9.self_attn.v_proj
- model.layers.25.self_attn.v_proj
- model.layers.5.self_attn.v_proj



wandb_project: EVA-Qwen2.5-32B-SFFT-v0.2
wandb_entity:
wandb_watch:
wandb_name: Unit-02
wandb_log_model:

gradient_accumulation_steps: 8
micro_batch_size: 1
num_epochs: 3
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 0.00005
max_grad_norm: 3

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: "unsloth"
# gradient_checkpointing_kwargs:
#   use_reentrant: true
early_stopping_patience:
resume_from_checkpoint: 
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 20
evals_per_epoch: 4
saves_per_epoch: 4
save_safetensors: true
hub_model_id: 
hub_strategy: 
debug:
deepspeed: deepspeed_configs/zero3_bf16.json
weight_decay: 0.1
# fsdp:
#   - full_shard
#   - auto_wrap
# fsdp_config:
#   fsdp_limit_all_gathers: true
#   fsdp_sync_module_states: false
#   fsdp_offload_params: true
#   fsdp_cpu_ram_efficient_loading: true
#   fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
#   fsdp_transformer_layer_cls_to_wrap: Qwen2DecoderLayer
#   fsdp_activation_checkpointing: true
#   fsdp_state_dict_type: SHARDED_STATE_DICT  # Changed from FULL_STATE_DICT
#   fsdp_sharding_strategy: FULL_SHARD
#   fsdp_forward_prefetch: false  # Added
#   fsdp_backward_prefetch: "BACKWARD_PRE"  # Added
#   fsdp_backward_prefetch_limit: 1  # Added
#   fsdp_mixed_precision: BF16  # Added