đ Output
This model is a fine - tuned version of [neuralmind/bert - base - portuguese - cased](https://huggingface.co/neuralmind/bert - base - portuguese - cased) on the None dataset. It achieves a loss of 0.6440 on the evaluation set.
đ Quick Start
This section provides a brief introduction to the model and its performance on the evaluation set.
đ Documentation
Model description
This model is a fine - tuned version of [neuralmind/bert - base - portuguese - cased](https://huggingface.co/neuralmind/bert - base - portuguese - cased) on the None dataset.
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 06
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 10000
- num_epochs: 15.0
- mixed_precision_training: Native AMP
Training results
Training Loss |
Epoch |
Step |
Validation Loss |
1.1985 |
0.22 |
2500 |
1.0940 |
1.0937 |
0.44 |
5000 |
1.0033 |
1.0675 |
0.66 |
7500 |
0.9753 |
1.0565 |
0.87 |
10000 |
0.9801 |
1.0244 |
1.09 |
12500 |
0.9526 |
0.9943 |
1.31 |
15000 |
0.9298 |
0.9799 |
1.53 |
17500 |
0.9035 |
0.95 |
1.75 |
20000 |
0.8835 |
0.933 |
1.97 |
22500 |
0.8636 |
0.9079 |
2.18 |
25000 |
0.8507 |
0.8938 |
2.4 |
27500 |
0.8397 |
0.8781 |
2.62 |
30000 |
0.8195 |
0.8647 |
2.84 |
32500 |
0.8088 |
0.8422 |
3.06 |
35000 |
0.7954 |
0.831 |
3.28 |
37500 |
0.7871 |
0.8173 |
3.5 |
40000 |
0.7721 |
0.8072 |
3.71 |
42500 |
0.7611 |
0.8011 |
3.93 |
45000 |
0.7532 |
0.7828 |
4.15 |
47500 |
0.7431 |
0.7691 |
4.37 |
50000 |
0.7367 |
0.7659 |
4.59 |
52500 |
0.7292 |
0.7606 |
4.81 |
55000 |
0.7245 |
0.8082 |
5.02 |
57500 |
0.7696 |
0.8114 |
5.24 |
60000 |
0.7695 |
0.8022 |
5.46 |
62500 |
0.7613 |
0.7986 |
5.68 |
65000 |
0.7558 |
0.8018 |
5.9 |
67500 |
0.7478 |
0.782 |
6.12 |
70000 |
0.7435 |
0.7743 |
6.34 |
72500 |
0.7367 |
0.774 |
6.55 |
75000 |
0.7313 |
0.7692 |
6.77 |
77500 |
0.7270 |
0.7604 |
6.99 |
80000 |
0.7200 |
0.7468 |
7.21 |
82500 |
0.7164 |
0.7486 |
7.43 |
85000 |
0.7117 |
0.7399 |
7.65 |
87500 |
0.7043 |
0.7306 |
7.86 |
90000 |
0.6956 |
0.7243 |
8.08 |
92500 |
0.6959 |
0.7132 |
8.3 |
95000 |
0.6916 |
0.71 |
8.52 |
97500 |
0.6853 |
0.7128 |
8.74 |
100000 |
0.6855 |
0.7088 |
8.96 |
102500 |
0.6809 |
0.7002 |
9.18 |
105000 |
0.6784 |
0.6953 |
9.39 |
107500 |
0.6737 |
0.695 |
9.61 |
110000 |
0.6714 |
0.6871 |
9.83 |
112500 |
0.6687 |
0.7161 |
10.05 |
115000 |
0.6961 |
0.7265 |
10.27 |
117500 |
0.7006 |
0.7284 |
10.49 |
120000 |
0.6941 |
0.724 |
10.7 |
122500 |
0.6887 |
0.7266 |
10.92 |
125000 |
0.6931 |
0.7051 |
11.14 |
127500 |
0.6846 |
0.7106 |
11.36 |
130000 |
0.6816 |
0.7011 |
11.58 |
132500 |
0.6830 |
0.6997 |
11.8 |
135000 |
0.6784 |
0.6969 |
12.02 |
137500 |
0.6734 |
0.6968 |
12.23 |
140000 |
0.6709 |
0.6867 |
12.45 |
142500 |
0.6656 |
0.6925 |
12.67 |
145000 |
0.6661 |
0.6795 |
12.89 |
147500 |
0.6606 |
0.6774 |
13.11 |
150000 |
0.6617 |
0.6756 |
13.33 |
152500 |
0.6563 |
0.6728 |
13.54 |
155000 |
0.6547 |
0.6732 |
13.76 |
157500 |
0.6520 |
0.6704 |
13.98 |
160000 |
0.6492 |
0.6666 |
14.2 |
162500 |
0.6446 |
0.6615 |
14.42 |
165000 |
0.6488 |
0.6638 |
14.64 |
167500 |
0.6523 |
0.6588 |
14.85 |
170000 |
0.6415 |
Framework versions
- Transformers 4.12.5
- Pytorch 1.10.1+cu113
- Datasets 1.17.0
- Tokenizers 0.10.3
đ License
This project is licensed under the MIT license.
đ Citing & Authors
If you use our work, please cite:
@incollection{Viegas_2023,
doi = {10.1007/978-3-031-36805-9_24},
url = {https://doi.org/10.1007%2F978-3-031-36805-9_24},
year = 2023,
publisher = {Springer Nature Switzerland},
pages = {349--365},
author = {Charles F. O. Viegas and Bruno C. Costa and Renato P. Ishii},
title = {{JurisBERT}: A New Approach that~Converts a~Classification Corpus into~an~{STS} One},
booktitle = {Computational Science and Its Applications {\textendash} {ICCSA} 2023}
}