đ Transformers Model README
This project is a fine - tuned model based on the transformers
library. It offers a fine - tuned version of Qwen/Qwen2.5 - 7B, with specific performance on the evaluation set.
đ Quick Start
This model is a fine - tuned version of Qwen/Qwen2.5-7B on the None dataset. It achieves a loss of 0.7923 on the evaluation set.

See axolotl config
The axolotl version used is 0.4.1
. The detailed configuration is as follows:
base_model: Qwen/Qwen2.5-7B
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
load_in_8bit: false
load_in_4bit: false
strict: false
datasets:
- path: PocketDoc/Dans-MemoryCore-CoreCurriculum-Small
type: sharegpt
conversation: chatml
- path: NewEden/Kalo-Opus-Instruct-22k-Refusal-Murdered
type: sharegpt
conversation: chatml
- path: Epiculous/Synthstruct-Gens-v1.1-Filtered-n-Cleaned
type: sharegpt
conversation: chatml
- path: NewEden/Gryphe-Sonnet-3.5-35k-Subset
type: sharegpt
conversation: chatml
- path: Nitral-AI/Reasoning-1shot_ShareGPT
type: sharegpt
conversation: chatml
- path: Nitral-AI/GU_Instruct-ShareGPT
type: sharegpt
conversation: chatml
- path: Nitral-AI/Medical_Instruct-ShareGPT
type: sharegpt
conversation: chatml
- path: AquaV/Resistance-Sharegpt
type: sharegpt
conversation: chatml
- path: AquaV/US-Army-Survival-Sharegpt
type: sharegpt
conversation: chatml
- path: Gryphe/Sonnet3.5-SlimOrcaDedupCleaned
type: sharegpt
conversation: chatml
chat_template: chatml
val_set_size: 0.002
output_dir: ./outputs/out
adapter:
lora_r:
lora_alpha:
lora_dropout:
lora_target_linear:
sequence_len: 8192
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true
plugins:
- axolotl.integrations.liger.LigerPlugin
liger_rope: true
liger_rms_norm: true
liger_swiglu: true
liger_fused_linear_cross_entropy: true
wandb_project: qwen7B
wandb_entity:
wandb_watch:
wandb_name: qwen7B
wandb_log_model:
gradient_accumulation_steps: 32
micro_batch_size: 1
num_epochs: 2
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.00001
weight_decay: 0.05
train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: true
gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
warmup_ratio: 0.1
evals_per_epoch: 4
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 2
debug:
deepspeed:
fsdp:
fsdp_config:
special_tokens:
pad_token: <pad>
đ Documentation
Model description
This model is a fine - tuned version of Qwen/Qwen2.5-7B. More detailed information about the model is yet to be provided.
Intended uses & limitations
More information regarding the intended uses and limitations of this model is needed.
Training and evaluation data
Details about the training and evaluation data are yet to be provided.
Training procedure
Training hyperparameters
The following hyperparameters were used during the training process:
Property |
Details |
learning_rate |
1e - 05 |
train_batch_size |
1 |
eval_batch_size |
1 |
seed |
42 |
distributed_type |
multi - GPU |
num_devices |
4 |
gradient_accumulation_steps |
32 |
total_train_batch_size |
128 |
total_eval_batch_size |
4 |
optimizer |
Adam with betas=(0.9,0.999) and epsilon=1e - 08 |
lr_scheduler_type |
cosine |
lr_scheduler_warmup_steps |
46 |
num_epochs |
2 |
Training results
The training results are presented in the following table:
Training Loss |
Epoch |
Step |
Validation Loss |
1.0297 |
0.0043 |
1 |
1.1468 |
0.8512 |
0.2515 |
58 |
0.8729 |
0.8496 |
0.5030 |
116 |
0.8193 |
0.8175 |
0.7546 |
174 |
0.8033 |
0.7868 |
1.0041 |
232 |
0.7961 |
0.8119 |
1.2555 |
290 |
0.7934 |
0.799 |
1.5069 |
348 |
0.7926 |
0.7891 |
1.7583 |
406 |
0.7923 |
Framework versions
The versions of the frameworks used are as follows:
- Transformers 4.45.0.dev0
- Pytorch 2.4.0+cu121
- Datasets 2.21.0
- Tokenizers 0.19.1
đ License
The project is licensed under the apache - 2.0
license.