🚀 32b-glm4-dans-personality-engine-v1.3.0-TestArticle-1
This model is a fine - tuned version of [THUDM/GLM - 4 - 32B - Base - 0414](https://huggingface.co/THUDM/GLM - 4 - 32B - Base - 0414) on the Dans - DiscountModels/pretokenization - test - 4 dataset. It achieves a loss of 1.6235 on the evaluation set.

See axolotl config
axolotl version: 0.10.0.dev0
base_model: THUDM/GLM-4-32B-Base-0414
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
trust_remote_code:
wandb_project: 32b-glm4-dans-personality-engine
wandb_watch:
wandb_run_id: V1.3.0-1-4
wandb_log_model:
hub_model_id: Dans-DiscountModels/32b-glm4-dans-personality-engine-v1.3.0-TestArticle-1
hub_strategy: "every_save"
hf_use_auth_token: true
output_dir: ./32b-glm4-dans-personality-engine
save_safetensors: true
datasets:
- path: Dans-DiscountModels/pretokenization-test-4
ds_type: parquet
type:
plugins:
- axolotl.integrations.liger.LigerPlugin
- axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
liger_rope: false
liger_rms_norm: true
liger_glu_activation: true
liger_fused_linear_cross_entropy: false
cut_cross_entropy: true
load_in_8bit: false
load_in_4bit: false
strict: false
dataset_prepared_path: ./32b-glm4-dans-personality-engine-data
val_set_size: 0.003
sequence_len: 32768
sample_packing: true
eval_sample_packing: true
pad_to_sequence_len: true
gradient_checkpointing: unsloth
gradient_accumulation_steps: 4
micro_batch_size: 1
num_epochs: 2
optimizer: ademamix_8bit
optim_args: "beta1=0.9,beta2=0.999,beta3=0.999,alpha=5"
lr_scheduler: rex
learning_rate: 0.000008
cosine_min_lr_ratio:
weight_decay: 0
max_grad_norm: 0.001
train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
tf32: false
early_stopping_patience:
resume_from_checkpoint:
auto_resume_from_checkpoints: false
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
warmup_ratio: 0.1
evals_per_epoch: 24
eval_table_size:
eval_max_new_tokens:
saves_per_epoch: 8
save_total_limit: 1
debug: false
deepspeed: /alloc/pocketdoc/axolotl/deepspeed_configs/zero3_bf16.json
fsdp:
fsdp_config:
special_tokens:
📚 Documentation
Model Information
Property |
Details |
Library Name |
transformers |
License |
MIT |
Base Model |
THUDM/GLM - 4 - 32B - Base - 0414 |
Tags |
axolotl, generated_from_trainer |
Datasets |
Dans - DiscountModels/pretokenization - test - 4 |
Model Name |
32b - glm4 - dans - personality - engine - v1.3.0 - TestArticle - 1 |
Training Procedure
Training Hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e - 06
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- distributed_type: multi - GPU
- num_devices: 8
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- total_eval_batch_size: 8
- optimizer: Use ademamix_8bit and the args are: beta1 = 0.9, beta2 = 0.999, beta3 = 0.999, alpha = 5
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 332
- num_epochs: 2.0
Training Results
Training Loss |
Epoch |
Step |
Validation Loss |
1.6456 |
0.0006 |
1 |
1.7604 |
1.6538 |
0.0421 |
70 |
1.7472 |
1.668 |
0.0842 |
140 |
1.7132 |
1.5877 |
0.1264 |
210 |
1.6934 |
1.7524 |
0.1685 |
280 |
1.6815 |
1.6687 |
0.2106 |
350 |
1.6738 |
1.7986 |
0.2527 |
420 |
1.6691 |
1.8379 |
0.2948 |
490 |
1.6659 |
1.6813 |
0.3369 |
560 |
1.6633 |
1.6749 |
0.3791 |
630 |
1.6607 |
1.5746 |
0.4212 |
700 |
1.6585 |
1.7503 |
0.4633 |
770 |
1.6565 |
1.6143 |
0.5054 |
840 |
1.6545 |
1.6 |
0.5475 |
910 |
1.6527 |
1.7525 |
0.5897 |
980 |
1.6510 |
1.5861 |
0.6318 |
1050 |
1.6493 |
1.7439 |
0.6739 |
1120 |
1.6477 |
1.6129 |
0.7160 |
1190 |
1.6464 |
1.4729 |
0.7581 |
1260 |
1.6454 |
1.6923 |
0.8002 |
1330 |
1.6451 |
1.6498 |
0.8424 |
1400 |
1.6441 |
1.5815 |
0.8845 |
1470 |
1.6429 |
1.6209 |
0.9266 |
1540 |
1.6418 |
1.6685 |
0.9687 |
1610 |
1.6408 |
1.7472 |
1.0108 |
1680 |
1.6397 |
1.5719 |
1.0529 |
1750 |
1.6386 |
1.7247 |
1.0951 |
1820 |
1.6377 |
1.7098 |
1.1372 |
1890 |
1.6367 |
1.6367 |
1.1793 |
1960 |
1.6358 |
1.7014 |
1.2214 |
2030 |
1.6349 |
1.6622 |
1.2635 |
2100 |
1.6340 |
1.5958 |
1.3057 |
2170 |
1.6331 |
1.59 |
1.3478 |
2240 |
1.6322 |
1.6959 |
1.3899 |
2310 |
1.6314 |
1.6595 |
1.4320 |
2380 |
1.6308 |
1.6163 |
1.4741 |
2450 |
1.6300 |
1.6593 |
1.5162 |
2520 |
1.6292 |
1.7528 |
1.5584 |
2590 |
1.6285 |
1.6423 |
1.6005 |
2660 |
1.6279 |
1.5997 |
1.6426 |
2730 |
1.6272 |
1.6696 |
1.6847 |
2800 |
1.6266 |
1.7232 |
1.7268 |
2870 |
1.6260 |
1.5094 |
1.7690 |
2940 |
1.6254 |
1.853 |
1.8111 |
3010 |
1.6249 |
1.756 |
1.8532 |
3080 |
1.6245 |
1.705 |
1.8953 |
3150 |
1.6240 |
1.6894 |
1.9374 |
3220 |
1.6237 |
1.5937 |
1.9795 |
3290 |
1.6235 |
Framework Versions
- Transformers 4.51.3
- Pytorch 2.4.1+cu121
- Datasets 3.5.0
- Tokenizers 0.21.1
📄 License
This project is licensed under the MIT license.