đ wiki_13
This model is a fine - tuned version of on an unknown dataset. It provides a loss of 2.1200 on the evaluation set, offering a certain level of performance for relevant tasks.
đ Quick Start
This section is not provided in the original document, so it is skipped.
⨠Features
This section is not provided in the original document, so it is skipped.
đĻ Installation
This section is not provided in the original document, so it is skipped.
đģ Usage Examples
This section is not provided in the original document, so it is skipped.
đ Documentation
Model description
This section lacks detailed information in the original document, so it is skipped.
Intended uses & limitations
This section lacks detailed information in the original document, so it is skipped.
Training and evaluation data
This section lacks detailed information in the original document, so it is skipped.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
Property |
Details |
learning_rate |
0.0001 |
train_batch_size |
16 |
eval_batch_size |
16 |
seed |
13 |
gradient_accumulation_steps |
2 |
total_train_batch_size |
32 |
optimizer |
Adam with betas=(0.9,0.999) and epsilon=1e - 08 |
lr_scheduler_type |
linear |
lr_scheduler_warmup_steps |
40000 |
training_steps |
100000 |
Training results
Training Loss |
Epoch |
Step |
Validation Loss |
No log |
1.5662 |
2000 |
7.6811 |
7.6872 |
3.1323 |
4000 |
6.6323 |
7.6872 |
4.6985 |
6000 |
6.5117 |
6.5145 |
6.2647 |
8000 |
6.4461 |
6.5145 |
7.8309 |
10000 |
6.3755 |
6.3572 |
9.3970 |
12000 |
6.3094 |
6.3572 |
10.9632 |
14000 |
6.2737 |
6.2339 |
12.5294 |
16000 |
6.2091 |
6.2339 |
14.0955 |
18000 |
6.1964 |
6.1124 |
15.6617 |
20000 |
6.1261 |
6.1124 |
17.2279 |
22000 |
5.9661 |
5.9136 |
18.7940 |
24000 |
5.5933 |
5.9136 |
20.3602 |
26000 |
5.0449 |
5.109 |
21.9264 |
28000 |
4.4819 |
5.109 |
23.4926 |
30000 |
4.0711 |
4.1502 |
25.0587 |
32000 |
3.7598 |
4.1502 |
26.6249 |
34000 |
3.4685 |
3.5347 |
28.1911 |
36000 |
3.3009 |
3.5347 |
29.7572 |
38000 |
3.1496 |
3.1576 |
31.3234 |
40000 |
3.0139 |
3.1576 |
32.8896 |
42000 |
2.9557 |
2.8847 |
34.4558 |
44000 |
2.8395 |
2.8847 |
36.0219 |
46000 |
2.7659 |
2.6809 |
37.5881 |
48000 |
2.6953 |
2.6809 |
39.1543 |
50000 |
2.6246 |
2.5261 |
40.7204 |
52000 |
2.5583 |
2.5261 |
42.2866 |
54000 |
2.5142 |
2.4073 |
43.8528 |
56000 |
2.4925 |
2.4073 |
45.4190 |
58000 |
2.4343 |
2.3129 |
46.9851 |
60000 |
2.4278 |
2.3129 |
48.5513 |
62000 |
2.3707 |
2.23 |
50.1175 |
64000 |
2.3806 |
2.23 |
51.6836 |
66000 |
2.3299 |
2.1662 |
53.2498 |
68000 |
2.3031 |
2.1662 |
54.8160 |
70000 |
2.2718 |
2.1093 |
56.3821 |
72000 |
2.2745 |
2.1093 |
57.9483 |
74000 |
2.2610 |
2.0596 |
59.5145 |
76000 |
2.2490 |
2.0596 |
61.0807 |
78000 |
2.1928 |
2.0165 |
62.6468 |
80000 |
2.1660 |
2.0165 |
64.2130 |
82000 |
2.1797 |
1.9818 |
65.7792 |
84000 |
2.1873 |
1.9818 |
67.3453 |
86000 |
2.1384 |
1.9505 |
68.9115 |
88000 |
2.1419 |
1.9505 |
70.4777 |
90000 |
2.1471 |
1.9231 |
72.0439 |
92000 |
2.1419 |
1.9231 |
73.6100 |
94000 |
2.1390 |
1.9072 |
75.1762 |
96000 |
2.1414 |
1.9072 |
76.7424 |
98000 |
2.1240 |
1.8894 |
78.3085 |
100000 |
2.1200 |
Framework versions
Property |
Details |
Transformers |
4.45.2 |
Pytorch |
2.5.1+cu124 |
Datasets |
3.0.1 |
Tokenizers |
0.20.1 |
đ§ Technical Details
This section is not provided in the original document, so it is skipped.
đ License
This section is not provided in the original document, so it is skipped.