32b-glm4-dans-personality-engine Open-source Large Language Model - Free Fine-tuning to Expand Language Processing Capabilities

32b Glm4 Dans Personality Engine V1.3.0 TestArticle 1

Developed by Dans-DiscountModels

A large language model fine-tuned on the Dans-DiscountModels/pretokenization-test-4 dataset, based on the THUDM/GLM-4-32B-Base-0414 model

Large Language Model

Transformers

Open Source License:MIT #32B Large Model #Long Sequence Inference #Personality Engine

Downloads 38

Release Time : 5/3/2025

Model Overview

This model is a large language model with 32B parameters, primarily used for text generation tasks. Based on the GLM-4 architecture and fine-tuned on specific datasets, it may possess personalized text generation capabilities.

Model Features

Long Context Support

Supports a context length of 32768 tokens, suitable for processing long documents

Efficient Training

Optimizes training efficiency using techniques like flash attention and gradient checkpointing

Personalized Fine-tuning

Fine-tuned on specific datasets, potentially enabling personalized text generation capabilities

Model Capabilities

Text Generation

Long Text Processing

Use Cases

Content Generation

Personalized Text Creation

Generates personalized content based on specific styles or themes

🚀 32b-glm4-dans-personality-engine-v1.3.0-TestArticle-1

This model is a fine - tuned version of [THUDM/GLM - 4 - 32B - Base - 0414](https://huggingface.co/THUDM/GLM - 4 - 32B - Base - 0414) on the Dans - DiscountModels/pretokenization - test - 4 dataset. It achieves a loss of 1.6235 on the evaluation set.

See axolotl config

axolotl version: 0.10.0.dev0

base_model: THUDM/GLM-4-32B-Base-0414
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer

trust_remote_code:

# wandb configuration
wandb_project: 32b-glm4-dans-personality-engine
wandb_watch:

wandb_run_id: V1.3.0-1-4 # V{Version}-{Run Number}-{Attempt Number}
wandb_log_model:

# push checkpoints to hub
hub_model_id: Dans-DiscountModels/32b-glm4-dans-personality-engine-v1.3.0-TestArticle-1
# how to push checkpoints to hub
# https://huggingface.co/docs/transformers/v4.31.0/en/main_classes/trainer#transformers.TrainingArguments.hub_strategy
hub_strategy: "every_save"
# Whether to use hf `use_auth_token` for loading datasets. Useful for fetching private datasets
# Required to be true when used in combination with `push_dataset_to_hub`
hf_use_auth_token: true

# where to save the finished model to
output_dir: ./32b-glm4-dans-personality-engine

save_safetensors: true

datasets:
  - path: Dans-DiscountModels/pretokenization-test-4
    ds_type: parquet
    type:

plugins:
  - axolotl.integrations.liger.LigerPlugin
  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
liger_rope: false
liger_rms_norm: true
liger_glu_activation: true
liger_fused_linear_cross_entropy: false
cut_cross_entropy: true

load_in_8bit: false
load_in_4bit: false
strict: false

dataset_prepared_path: ./32b-glm4-dans-personality-engine-data
val_set_size: 0.003

sequence_len: 32768

sample_packing: true
eval_sample_packing: true

pad_to_sequence_len: true

gradient_checkpointing: unsloth

gradient_accumulation_steps: 4
micro_batch_size: 1

num_epochs: 2

optimizer: ademamix_8bit
optim_args: "beta1=0.9,beta2=0.999,beta3=0.999,alpha=5"

lr_scheduler: rex
learning_rate: 0.000008
cosine_min_lr_ratio:

weight_decay: 0

max_grad_norm: 0.001

train_on_inputs: false
group_by_length: false

bf16: true
fp16: false
tf32: false

early_stopping_patience:

resume_from_checkpoint:
auto_resume_from_checkpoints: false

local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_ratio: 0.1

evals_per_epoch: 24
eval_table_size:
eval_max_new_tokens:

saves_per_epoch: 8
save_total_limit: 1

debug: false

deepspeed: /alloc/pocketdoc/axolotl/deepspeed_configs/zero3_bf16.json

fsdp:
fsdp_config:

special_tokens:

📚 Documentation

Model Information

Property	Details
Library Name	transformers
License	MIT
Base Model	THUDM/GLM - 4 - 32B - Base - 0414
Tags	axolotl, generated_from_trainer
Datasets	Dans - DiscountModels/pretokenization - test - 4
Model Name	32b - glm4 - dans - personality - engine - v1.3.0 - TestArticle - 1

Training Procedure

Training Hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e - 06
train_batch_size: 1
eval_batch_size: 1
seed: 42
distributed_type: multi - GPU
num_devices: 8
gradient_accumulation_steps: 4
total_train_batch_size: 32
total_eval_batch_size: 8
optimizer: Use ademamix_8bit and the args are: beta1 = 0.9, beta2 = 0.999, beta3 = 0.999, alpha = 5
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 332
num_epochs: 2.0

Training Results

Training Loss	Epoch	Step	Validation Loss
1.6456	0.0006	1	1.7604
1.6538	0.0421	70	1.7472
1.668	0.0842	140	1.7132
1.5877	0.1264	210	1.6934
1.7524	0.1685	280	1.6815
1.6687	0.2106	350	1.6738
1.7986	0.2527	420	1.6691
1.8379	0.2948	490	1.6659
1.6813	0.3369	560	1.6633
1.6749	0.3791	630	1.6607
1.5746	0.4212	700	1.6585
1.7503	0.4633	770	1.6565
1.6143	0.5054	840	1.6545
1.6	0.5475	910	1.6527
1.7525	0.5897	980	1.6510
1.5861	0.6318	1050	1.6493
1.7439	0.6739	1120	1.6477
1.6129	0.7160	1190	1.6464
1.4729	0.7581	1260	1.6454
1.6923	0.8002	1330	1.6451
1.6498	0.8424	1400	1.6441
1.5815	0.8845	1470	1.6429
1.6209	0.9266	1540	1.6418
1.6685	0.9687	1610	1.6408
1.7472	1.0108	1680	1.6397
1.5719	1.0529	1750	1.6386
1.7247	1.0951	1820	1.6377
1.7098	1.1372	1890	1.6367
1.6367	1.1793	1960	1.6358
1.7014	1.2214	2030	1.6349
1.6622	1.2635	2100	1.6340
1.5958	1.3057	2170	1.6331
1.59	1.3478	2240	1.6322
1.6959	1.3899	2310	1.6314
1.6595	1.4320	2380	1.6308
1.6163	1.4741	2450	1.6300
1.6593	1.5162	2520	1.6292
1.7528	1.5584	2590	1.6285
1.6423	1.6005	2660	1.6279
1.5997	1.6426	2730	1.6272
1.6696	1.6847	2800	1.6266
1.7232	1.7268	2870	1.6260
1.5094	1.7690	2940	1.6254
1.853	1.8111	3010	1.6249
1.756	1.8532	3080	1.6245
1.705	1.8953	3150	1.6240
1.6894	1.9374	3220	1.6237
1.5937	1.9795	3290	1.6235

Framework Versions

Transformers 4.51.3
Pytorch 2.4.1+cu121
Datasets 3.5.0
Tokenizers 0.21.1

📄 License

This project is licensed under the MIT license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご