🚀 wiki_13
This is a fine - tuned model on an unknown dataset, achieving a loss of 2.9591 on the evaluation set.
🚀 Quick Start
This model is a fine - tuned version of on an unknown dataset.
It achieves the following results on the evaluation set:
📚 Documentation
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 16
- eval_batch_size: 16
- seed: 13
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e - 08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 40000
- training_steps: 100000
Training results
Training Loss |
Epoch |
Step |
Validation Loss |
No log |
0.9847 |
2000 |
8.0854 |
8.1286 |
1.9695 |
4000 |
7.4147 |
8.1286 |
2.9542 |
6000 |
7.2936 |
7.3042 |
3.9389 |
8000 |
7.2263 |
7.3042 |
4.9237 |
10000 |
7.1350 |
7.1348 |
5.9084 |
12000 |
7.0611 |
7.1348 |
6.8932 |
14000 |
7.0000 |
6.9718 |
7.8779 |
16000 |
6.9539 |
6.9718 |
8.8626 |
18000 |
6.8852 |
6.8205 |
9.8474 |
20000 |
6.8512 |
6.8205 |
10.8321 |
22000 |
6.8137 |
6.6971 |
11.8168 |
24000 |
6.7650 |
6.6971 |
12.8016 |
26000 |
6.6483 |
6.5488 |
13.7863 |
28000 |
6.5099 |
6.5488 |
14.7710 |
30000 |
6.2472 |
6.2179 |
15.7558 |
32000 |
5.9238 |
6.2179 |
16.7405 |
34000 |
5.3578 |
5.4765 |
17.7253 |
36000 |
5.0209 |
5.4765 |
18.7100 |
38000 |
4.7463 |
4.8038 |
19.6947 |
40000 |
4.5390 |
4.8038 |
20.6795 |
42000 |
4.3029 |
4.341 |
21.6642 |
44000 |
4.1737 |
4.341 |
22.6489 |
46000 |
4.0038 |
3.993 |
23.6337 |
48000 |
3.8794 |
3.993 |
24.6184 |
50000 |
3.7730 |
3.74 |
25.6032 |
52000 |
3.6758 |
3.74 |
26.5879 |
54000 |
3.6050 |
3.5482 |
27.5726 |
56000 |
3.5573 |
3.5482 |
28.5574 |
58000 |
3.4807 |
3.4039 |
29.5421 |
60000 |
3.4149 |
3.4039 |
30.5268 |
62000 |
3.3689 |
3.2796 |
31.5116 |
64000 |
3.3317 |
3.2796 |
32.4963 |
66000 |
3.2805 |
3.1856 |
33.4810 |
68000 |
3.2562 |
3.1856 |
34.4658 |
70000 |
3.2052 |
3.1083 |
35.4505 |
72000 |
3.1827 |
3.1083 |
36.4353 |
74000 |
3.1513 |
3.0408 |
37.4200 |
76000 |
3.1234 |
3.0408 |
38.4047 |
78000 |
3.0981 |
2.9838 |
39.3895 |
80000 |
3.0862 |
2.9838 |
40.3742 |
82000 |
3.0890 |
2.939 |
41.3589 |
84000 |
3.0375 |
2.939 |
42.3437 |
86000 |
3.0297 |
2.8967 |
43.3284 |
88000 |
3.0112 |
2.8967 |
44.3131 |
90000 |
2.9907 |
2.8682 |
45.2979 |
92000 |
2.9836 |
2.8682 |
46.2826 |
94000 |
3.0020 |
2.8445 |
47.2674 |
96000 |
2.9588 |
2.8445 |
48.2521 |
98000 |
2.9804 |
2.8208 |
49.2368 |
100000 |
2.9591 |
Framework versions
- Transformers 4.45.2
- Pytorch 2.5.1+cu124
- Datasets 3.0.1
- Tokenizers 0.20.1