24b-ms-dans-personality-engine-v1.3.0开源大模型 - 专注个性化文本生成服务

首页

24b Ms Dans Personality Engine V1.3.0 TestArticle 1

由 Dans-DiscountModels 开发

基于Mistral-Small-3.1-24B-Base-2503-hf-DanChat微调的大语言模型，专注于个性化文本生成

大型语言模型

Transformers

开源协议:Apache-2.0 #超长序列生成 #个性化对话引擎 #高效微调适配

下载量 907

发布时间 : 5/6/2025

模型简介

这是一个24B参数规模的大语言模型，基于Mistral架构进行微调，适用于个性化文本生成任务。模型使用Axolotl框架训练，支持长文本处理（序列长度33000）。

模型特点

长文本处理能力

支持长达33000的序列长度，适合处理长文档和复杂上下文

个性化生成

针对个性化文本生成任务进行了专门微调

高效训练配置

使用Axolotl框架和DeepSpeed Zero3优化进行高效训练

模型能力

文本生成

个性化内容创作

长文本处理

使用案例

内容创作

个性化文章写作

根据用户偏好生成个性化风格的文章内容

对话系统

个性化聊天机器人

构建具有特定个性的对话代理

🚀 24b-ms-dans-personality-engine-v1.3.0-TestArticle-1

本项目是基于transformers库的模型微调项目，将Dans-DiscountModels/Mistral-Small-3.1-24B-Base-2503-hf-DanChat模型在Dans-DiscountModels/pretokenization-test-6数据集上进行微调，以实现特定的功能。

查看Axolotl配置

Axolotl版本: 0.10.0.dev0

base_model: Dans-DiscountModels/Mistral-Small-3.1-24B-Base-2503-hf-DanChat
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer

trust_remote_code:

# wandb配置
wandb_project: 24b-ms-dans-personality-engine
wandb_watch:

wandb_run_id: V1.3.0-1-5 # V{版本}-{运行编号}-{尝试编号}
wandb_log_model:

# 将检查点推送到Hub
hub_model_id: Dans-DiscountModels/24b-ms-dans-personality-engine-v1.3.0-TestArticle-1
# 如何将检查点推送到Hub
# https://huggingface.co/docs/transformers/v4.31.0/en/main_classes/trainer#transformers.TrainingArguments.hub_strategy
hub_strategy: "every_save"
# 是否使用HF的`use_auth_token`来加载数据集。对于获取私有数据集很有用
# 与`push_dataset_to_hub`结合使用时必须为true
hf_use_auth_token: true

# 完成的模型保存位置
output_dir: ./24b-ms-dans-personality-engine

save_safetensors: true

datasets:
  - path: Dans-DiscountModels/pretokenization-test-6
    ds_type: parquet
    type:

plugins:
  - axolotl.integrations.liger.LigerPlugin
  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
liger_rope: true
liger_rms_norm: true
liger_layer_norm: true
liger_glu_activation: true
liger_fused_linear_cross_entropy: false
cut_cross_entropy: true

load_in_8bit: false
load_in_4bit: false
strict: false

adapter:
lora_model_dir:

dataset_prepared_path: ./24b-ms-dans-personality-engine
val_set_size: 0.0

sequence_len: 33000

sample_packing: true
eval_sample_packing: true

pad_to_sequence_len: true

gradient_checkpointing: true

gradient_accumulation_steps: 4
micro_batch_size: 1

num_epochs: 2

optimizer: ademamix_8bit
optim_args: "beta1=0.9,beta2=0.999,beta3=0.999,alpha=5"

lr_scheduler: rex
learning_rate: 0.000001
cosine_min_lr_ratio:

max_grad_norm: 0.001

train_on_inputs: false
group_by_length: false

bf16: true
fp16: false
tf32: false

early_stopping_patience:

resume_from_checkpoint:
auto_resume_from_checkpoints: false

local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_ratio: 0.1

evals_per_epoch: 24
eval_table_size:
eval_max_new_tokens:

saves_per_epoch: 4
save_total_limit: 1

debug: false

deepspeed: deepspeed_configs/zero3_bf16.json

fsdp:
fsdp_config:

special_tokens:

🚀 快速开始

本模型是Dans-DiscountModels/Mistral-Small-3.1-24B-Base-2503-hf-DanChat在Dans-DiscountModels/pretokenization-test-6数据集上的微调版本。

📚 详细文档

模型描述

更多信息待补充。

预期用途与限制

更多信息待补充。

训练和评估数据

更多信息待补充。

训练过程

训练超参数

训练过程中使用了以下超参数：

学习率（learning_rate）: 1e-06
训练批次大小（train_batch_size）: 1
评估批次大小（eval_batch_size）: 1
随机种子（seed）: 42
分布式类型（distributed_type）: 多GPU
设备数量（num_devices）: 8
梯度累积步数（gradient_accumulation_steps）: 4
总训练批次大小（total_train_batch_size）: 32
总评估批次大小（total_eval_batch_size）: 8
优化器（optimizer）: 使用ademamix_8bit，参数为：
- beta1=0.9
- beta2=0.999
- beta3=0.999
- alpha=5
学习率调度器类型（lr_scheduler_type）: 余弦
学习率调度器热身步数（lr_scheduler_warmup_steps）: 338
训练轮数（num_epochs）: 2.0