magnum-v4-72b-FP8-Dynamicオープンソース大規模モデル - 推論を最適化し、Claude 3の散文品質を再現

ホーム

Magnum V4 72b FP8 Dynamic

Infermaticによって開発

Qwen2.5 - 72B - Instructをベースに微調整された72Bパラメータの大規模言語モデルで、動的FP8量子化技術を用いて推論効率を最適化し、Claude 3の散文の質を再現することを目的としています。

大規模言語モデル

Transformers

英語オープンソースライセンス:Apache-2.0 #FP8動的量子化 #Claude3スタイルの再現 #長文脈サポート

ダウンロード数 2,106

リリース時間 : 10/21/2024

モデル概要

これは実験的なモデルで、Qwen2.5 - 72B - Instructをベースに指令微調整を行い、高品質なテキスト生成、特にClaude 3の文章スタイルの模倣に特化しています。

モデル特徴

動的FP8量子化

AutoFP8技術を用いて動的量子化を行い、モデルの品質を維持しながら推論効率を大幅に向上させます。

長文脈サポート

32kトークンの文脈長をサポートし、長いドキュメントの処理に適しています。

Claudeスタイルの文章作成

Claude 3（特にSonnetとOpus）の散文の質を再現するように特別に最適化されています。

多データセット微調整

6つの高品質なデータセットを使用して全パラメータの微調整を行い、多様なタスクでの性能を向上させます。

モデル能力

長文テキスト生成

対話システム

創造的な文章作成

指令追従

ロールプレイ

使用事例

創造的な文章作成

文学創作支援

Claudeスタイルの高品質な散文や物語を生成します。

Claude 3に近いスタイルの文学作品を生成できます。

対話システム

ロールプレイチャット

SillyTavernなどのプラットフォームで高品質なキャラクター間のやり取りを実現します。

複雑なキャラクター設定やシナリオ対話をサポートします。

🚀 マグナムモデルの量子化バージョン

このモデルは、Claude 3 モデル（特に Sonnet と Opus）の散文品質を再現するように設計された一連のモデルです。具体的には、Qwen2.5 - 72B - Instruct をベースにファインチューニングされています。

🚀 クイックスタート

この量子化は infermatic.ai のために作成されました。また、anthracite - org/magnum - v4 - 72b の動的 FP8 量子化が AutoFP8 を使用して行われています。

image/png

✨ 主な機能

Claude 3 モデルの散文品質を再現することを目指した設計
Qwen2.5 - 72B - Instruct をベースにファインチューニング

💻 使用例

基本的な使用法

典型的な入力は以下のようになります。

<|im_start|>system
system prompt<|im_end|>
<|im_start|>user
Hi there!<|im_end|>
<|im_start|>assistant
Nice to meet you!<|im_end|>
<|im_start|>user
Can I ask a question?<|im_end|>
<|im_start|>assistant

高度な使用法

SillyTavern での使用に関するテンプレートが用意されています。

コンテキストテンプレート

コンテキストテンプレートを表示

{
  "story_string": "<|im_start|>system\n{{#if system}}{{system}}\n{{/if}}{{#if wiBefore}}{{wiBefore}}\n{{/if}}{{#if description}}{{description}}\n{{/if}}{{#if personality}}{{char}}'s personality: {{personality}}\n{{/if}}{{#if scenario}}Scenario: {{scenario}}\n{{/if}}{{#if wiAfter}}{{wiAfter}}\n{{/if}}{{#if persona}}{{persona}}\n{{/if}}{{trim}}<|im_end|>\n",
  "example_separator": "",
  "chat_start": "",
  "use_stop_strings": false,
  "allow_jailbreak": false,
  "always_force_name2": true,
  "trim_sentences": false,
  "include_newline": false,
  "single_line": false,
  "name": "Magnum ChatML"
}

インストラクションテンプレート

インストラクションテンプレートを表示

{
  "system_prompt": "Currently, your role is {{char}}, described in detail below. As {{char}}, continue the narrative exchange with {{user}}.\n\n<Guidelines>\n• Maintain the character persona but allow it to evolve with the story.\n• Be creative and proactive. Drive the story forward, introducing plotlines and events when relevant.\n• All types of outputs are encouraged; respond accordingly to the narrative.\n• Include dialogues, actions, and thoughts in each response.\n• Utilize all five senses to describe scenarios within {{char}}'s dialogue.\n• Use emotional symbols such as \"!\" and \"~\" in appropriate contexts.\n• Incorporate onomatopoeia when suitable.\n• Allow time for {{user}} to respond with their own input, respecting their agency.\n• Act as secondary characters and NPCs as needed, and remove them when appropriate.\n• When prompted for an Out of Character [OOC:] reply, answer neutrally and in plaintext, not as {{char}}.\n</Guidelines>\n\n<Forbidden>\n• Using excessive literary embellishments and purple prose unless dictated by {{char}}'s persona.\n• Writing for, speaking, thinking, acting, or replying as {{user}} in your response.\n• Repetitive and monotonous outputs.\n• Positivity bias in your replies.\n• Being overly extreme or NSFW when the narrative context is inappropriate.\n</Forbidden>\n\nFollow the instructions in <Guidelines></Guidelines>, avoiding the items listed in <Forbidden></Forbidden>.",
  "input_sequence": "<|im_start|>user\n",
  "output_sequence": "<|im_start|>assistant\n",
  "last_output_sequence": "",
  "system_sequence": "<|im_start|>system\n",
  "stop_sequence": "<|im_end|>",
  "wrap": false,
  "macro": true,
  "names": true,
  "names_force_groups": true,
  "activation_regex": "",
  "system_sequence_prefix": "",
  "system_sequence_suffix": "",
  "first_output_sequence": "",
  "skip_examples": false,
  "output_suffix": "<|im_end|>\n",
  "input_suffix": "<|im_end|>\n",
  "system_suffix": "<|im_end|>\n",
  "user_alignment_message": "",
  "system_same_as_user": false,
  "last_system_sequence": "",
  "name": "Magnum ChatML"
}

Axolotl 設定

Axolotl 設定を表示

base_model: /workspace/data/models/Qwen2.5-72B-Instruct
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer

plugins:
  - axolotl.integrations.liger.LigerPlugin
liger_rope: true
liger_rms_norm: true
liger_swiglu: true
liger_fused_linear_cross_entropy: true

load_in_8bit: false
load_in_4bit: false
strict: false

datasets:
  - path: anthracite-org/c2_logs_32k_llama3_qwen2_v1.2
    type: sharegpt
    conversation: chatml
  - path: anthracite-org/kalo-opus-instruct-22k-no-refusal
    type: sharegpt
    conversation: chatml
  - path: lodrick-the-lafted/kalo-opus-instruct-3k-filtered
    type: sharegpt
    conversation: chatml
  - path: anthracite-org/nopm_claude_writing_fixed
    type: sharegpt
    conversation: chatml
  - path: anthracite-org/kalo_opus_misc_240827
    type: sharegpt
    conversation: chatml
  - path: anthracite-org/kalo_misc_part2
    type: sharegpt
    conversation: chatml
#chat_template: chatml
shuffle_merged_datasets: true
#default_system_message: "You are an assistant that responds to the user."
dataset_prepared_path: /workspace/data/magnum-72b-data
val_set_size: 0.0
output_dir: /workspace/data/72b-fft-out

sequence_len: 32768
sample_packing: true
pad_to_sequence_len: true

adapter:
lora_model_dir:
lora_r:
lora_alpha:
lora_dropout:
lora_target_linear:
lora_fan_in_fan_out:

wandb_project: 72b-magnum-fft
wandb_entity:
wandb_watch:
wandb_name: alter-attempt-01
wandb_log_model:

gradient_accumulation_steps: 2
micro_batch_size: 1
num_epochs: 2
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.000004

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 40
evals_per_epoch:
eval_table_size:
eval_max_new_tokens:
saves_per_epoch: 2
debug:
deepspeed: deepspeed_configs/zero3_bf16.json
weight_decay: 0.01
fsdp:
fsdp_config:
special_tokens: