🚀 wiki_42
wiki_42 是在未知数据集上对 进行微调后的模型版本。该模型在评估集上取得了一定的效果,例如损失率为 2.1322。
🚀 快速开始
此模型是在未知数据集上对 进行微调后的版本,在评估集上的损失为 2.1322。
🔧 技术细节
训练超参数
训练过程中使用了以下超参数:
- 学习率:0.0001
- 训练批次大小:16
- 评估批次大小:16
- 随机种子:42
- 梯度累积步数:2
- 总训练批次大小:32
- 优化器:Adam(β1=0.9,β2=0.999,ε=1e - 08)
- 学习率调度器类型:线性
- 学习率调度器热身步数:40000
- 训练步数:100000
训练结果
训练损失 |
轮数 |
步数 |
验证损失 |
无日志 |
1.5662 |
2000 |
7.6634 |
7.6907 |
3.1323 |
4000 |
6.6436 |
7.6907 |
4.6985 |
6000 |
6.5194 |
6.5174 |
6.2647 |
8000 |
6.4353 |
6.5174 |
7.8309 |
10000 |
6.3850 |
6.3619 |
9.3970 |
12000 |
6.3224 |
6.3619 |
10.9632 |
14000 |
6.2452 |
6.2364 |
12.5294 |
16000 |
6.2350 |
6.2364 |
14.0955 |
18000 |
6.1499 |
6.1142 |
15.6617 |
20000 |
6.1037 |
6.1142 |
17.2279 |
22000 |
5.9377 |
5.8469 |
18.7940 |
24000 |
5.4239 |
5.8469 |
20.3602 |
26000 |
4.9251 |
4.9884 |
21.9264 |
28000 |
4.3860 |
4.9884 |
23.4926 |
30000 |
3.9695 |
4.0613 |
25.0587 |
32000 |
3.6959 |
4.0613 |
26.6249 |
34000 |
3.4763 |
3.5095 |
28.1911 |
36000 |
3.3151 |
3.5095 |
29.7572 |
38000 |
3.1737 |
3.1643 |
31.3234 |
40000 |
3.0445 |
3.1643 |
32.8896 |
42000 |
2.9430 |
2.9042 |
34.4558 |
44000 |
2.8326 |
2.9042 |
36.0219 |
46000 |
2.7730 |
2.6997 |
37.5881 |
48000 |
2.7106 |
2.6997 |
39.1543 |
50000 |
2.6288 |
2.5476 |
40.7204 |
52000 |
2.5801 |
2.5476 |
42.2866 |
54000 |
2.5604 |
2.4341 |
43.8528 |
56000 |
2.4778 |
2.4341 |
45.4190 |
58000 |
2.4722 |
2.3354 |
46.9851 |
60000 |
2.4329 |
2.3354 |
48.5513 |
62000 |
2.3689 |
2.2574 |
50.1175 |
64000 |
2.3834 |
2.2574 |
51.6836 |
66000 |
2.3362 |
2.1901 |
53.2498 |
68000 |
2.3154 |
2.1901 |
54.8160 |
70000 |
2.3113 |
2.1297 |
56.3821 |
72000 |
2.2685 |
2.1297 |
57.9483 |
74000 |
2.2540 |
2.0848 |
59.5145 |
76000 |
2.2351 |
2.0848 |
61.0807 |
78000 |
2.2405 |
2.0416 |
62.6468 |
80000 |
2.2016 |
2.0416 |
64.2130 |
82000 |
2.2054 |
2.004 |
65.7792 |
84000 |
2.1990 |
2.004 |
67.3453 |
86000 |
2.1745 |
1.9701 |
68.9115 |
88000 |
2.1781 |
1.9701 |
70.4777 |
90000 |
2.1825 |
1.9457 |
72.0439 |
92000 |
2.1373 |
1.9457 |
73.6100 |
94000 |
2.1161 |
1.9247 |
75.1762 |
96000 |
2.1504 |
1.9247 |
76.7424 |
98000 |
2.1297 |
1.9142 |
78.3085 |
100000 |
2.1322 |
框架版本
- Transformers 4.45.2
- Pytorch 2.5.1+cu124
- Datasets 3.0.1
- Tokenizers 0.20.1