đ sst-t5-base
This project presents a fine - tuned model based on t5 - base
, specifically optimized on the sst
dataset. It offers high - performance results in relevant tasks, with a low loss and mean squared error on the evaluation set.
đ Quick Start
This model is a fine - tuned version of [t5 - base](https://huggingface.co/t5 - base) on the sst
dataset.
It achieves the following results on the evaluation set:
đ Documentation
Training and Evaluation
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e - 05
- train_batch_size: 8
- eval_batch_size: 32
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
- lr_scheduler_type: linear
- num_epochs: 20
Training results
Training Loss |
Epoch |
Step |
Validation Loss |
Mse |
No log |
1.0 |
267 |
0.0196 |
0.0196 |
0.0237 |
2.0 |
534 |
0.0179 |
0.0179 |
0.0237 |
3.0 |
801 |
0.0174 |
0.0174 |
0.0133 |
4.0 |
1068 |
0.0182 |
0.0182 |
0.0133 |
5.0 |
1335 |
0.0181 |
0.0181 |
0.0101 |
6.0 |
1602 |
0.0180 |
0.0180 |
0.0101 |
7.0 |
1869 |
0.0183 |
0.0183 |
0.0083 |
8.0 |
2136 |
0.0188 |
0.0188 |
0.0083 |
9.0 |
2403 |
0.0185 |
0.0186 |
0.0067 |
10.0 |
2670 |
0.0187 |
0.0187 |
0.0067 |
11.0 |
2937 |
0.0184 |
0.0184 |
0.0057 |
12.0 |
3204 |
0.0186 |
0.0186 |
0.0057 |
13.0 |
3471 |
0.0194 |
0.0194 |
0.005 |
14.0 |
3738 |
0.0175 |
0.0176 |
0.0045 |
15.0 |
4005 |
0.0182 |
0.0182 |
0.0045 |
16.0 |
4272 |
0.0183 |
0.0183 |
0.0041 |
17.0 |
4539 |
0.0187 |
0.0187 |
0.0041 |
18.0 |
4806 |
0.0186 |
0.0186 |
0.0038 |
19.0 |
5073 |
0.0188 |
0.0188 |
0.0038 |
20.0 |
5340 |
0.0185 |
0.0185 |
Framework versions
- Transformers 4.37.0
- Pytorch 1.13.1+cu117
- Datasets 2.15.0
- Tokenizers 0.15.2
đ License
The model is released under the Apache 2.0 license.
Property |
Details |
Model Type |
Fine - tuned version of t5 - base on sst dataset |
Training Data |
sst |