Viet Tones Model
V
Viet Tones Model
Developed by StevenLe456
Vietnamese tone recognition model fine-tuned on wav2vec2-base-vietnamese-250h, accuracy 59.72%
Downloads 22
Release Time : 8/10/2023
Model Overview
This model focuses on tone recognition tasks in Vietnamese speech, suitable for speech processing and analysis scenarios
Model Features
Vietnamese Tone Recognition
Specialized classification capability for six Vietnamese tones
Fine-tuned on Pre-trained Model
Optimized based on wav2vec2 pre-trained with 250 hours of Vietnamese speech data
Model Capabilities
Vietnamese speech analysis
Tone feature extraction
Speech classification
Use Cases
Speech Processing
Vietnamese Pronunciation Assessment
Used for pronunciation accuracy evaluation in language learning applications
Can recognize 59.72% of tones
Speech-to-Text Preprocessing
Serves as a pre-processing tone module for Vietnamese ASR systems
đ viet_tones_model
This model is a fine - tuned version of [nguyenvulebinh/wav2vec2 - base - vietnamese - 250h](https://huggingface.co/nguyenvulebinh/wav2vec2 - base - vietnamese - 250h), aiming to achieve better performance on specific tasks.
đ Quick Start
This model is a fine - tuned version of [nguyenvulebinh/wav2vec2 - base - vietnamese - 250h](https://huggingface.co/nguyenvulebinh/wav2vec2 - base - vietnamese - 250h) on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.9783
- Accuracy: 0.5972
đ Documentation
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
đ§ Technical Details
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e - 05
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 110
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy |
---|---|---|---|---|
No log | 0.89 | 6 | 1.7955 | 0.1296 |
1.7924 | 1.93 | 13 | 1.7938 | 0.1343 |
1.7919 | 2.96 | 20 | 1.7916 | 0.2037 |
1.7919 | 4.0 | 27 | 1.7907 | 0.1713 |
1.7903 | 4.89 | 33 | 1.7886 | 0.1852 |
1.7883 | 5.93 | 40 | 1.7798 | 0.2269 |
1.7883 | 6.96 | 47 | 1.7487 | 0.25 |
1.7717 | 8.0 | 54 | 1.7104 | 0.2407 |
1.726 | 8.89 | 60 | 1.6488 | 0.2685 |
1.726 | 9.93 | 67 | 1.5835 | 0.2731 |
1.6651 | 10.96 | 74 | 1.6020 | 0.2778 |
1.6332 | 12.0 | 81 | 1.5351 | 0.2778 |
1.6332 | 12.89 | 87 | 1.4977 | 0.2963 |
1.5708 | 13.93 | 94 | 1.4903 | 0.2870 |
1.5543 | 14.96 | 101 | 1.4671 | 0.2731 |
1.5543 | 16.0 | 108 | 1.3992 | 0.3194 |
1.4872 | 16.89 | 114 | 1.3854 | 0.3009 |
1.4861 | 17.93 | 121 | 1.3411 | 0.3426 |
1.4861 | 18.96 | 128 | 1.3142 | 0.3472 |
1.4281 | 20.0 | 135 | 1.3021 | 0.4259 |
1.38 | 20.89 | 141 | 1.2657 | 0.4028 |
1.38 | 21.93 | 148 | 1.2372 | 0.4352 |
1.3472 | 22.96 | 155 | 1.2341 | 0.4815 |
1.3029 | 24.0 | 162 | 1.1815 | 0.4306 |
1.3029 | 24.89 | 168 | 1.1797 | 0.4954 |
1.3042 | 25.93 | 175 | 1.1403 | 0.4583 |
1.281 | 26.96 | 182 | 1.1349 | 0.4722 |
1.281 | 28.0 | 189 | 1.1369 | 0.4907 |
1.2614 | 28.89 | 195 | 1.0999 | 0.4954 |
1.2133 | 29.93 | 202 | 1.1677 | 0.4676 |
1.2133 | 30.96 | 209 | 1.0785 | 0.5 |
1.2527 | 32.0 | 216 | 1.1092 | 0.4861 |
1.1722 | 32.89 | 222 | 1.0424 | 0.5185 |
1.1722 | 33.93 | 229 | 1.0791 | 0.4907 |
1.1225 | 34.96 | 236 | 1.0447 | 0.4907 |
1.1447 | 36.0 | 243 | 1.0777 | 0.4583 |
1.1447 | 36.89 | 249 | 1.0141 | 0.4954 |
1.1484 | 37.93 | 256 | 1.0196 | 0.5324 |
1.11 | 38.96 | 263 | 0.9791 | 0.5417 |
1.046 | 40.0 | 270 | 0.9798 | 0.5231 |
1.046 | 40.89 | 276 | 0.9366 | 0.5694 |
1.0582 | 41.93 | 283 | 0.9645 | 0.5602 |
1.0569 | 42.96 | 290 | 0.9764 | 0.5694 |
1.0569 | 44.0 | 297 | 1.0340 | 0.5324 |
1.028 | 44.89 | 303 | 0.9969 | 0.5463 |
1.04 | 45.93 | 310 | 1.0251 | 0.5185 |
1.04 | 46.96 | 317 | 1.0447 | 0.5417 |
0.9889 | 48.0 | 324 | 0.9487 | 0.5324 |
1.0055 | 48.89 | 330 | 1.0147 | 0.5 |
1.0055 | 49.93 | 337 | 1.0015 | 0.5046 |
0.9955 | 50.96 | 344 | 0.9763 | 0.5278 |
0.9382 | 52.0 | 351 | 1.0306 | 0.5278 |
0.9382 | 52.89 | 357 | 0.9970 | 0.5463 |
0.9601 | 53.93 | 364 | 0.9487 | 0.5741 |
0.9736 | 54.96 | 371 | 0.9658 | 0.5463 |
0.9736 | 56.0 | 378 | 0.9789 | 0.5602 |
0.9237 | 56.89 | 384 | 0.9940 | 0.5463 |
0.9588 | 57.93 | 391 | 0.9778 | 0.5463 |
0.9588 | 58.96 | 398 | 0.9789 | 0.5648 |
0.9393 | 60.0 | 405 | 0.9612 | 0.5602 |
0.9291 | 60.89 | 411 | 0.9141 | 0.5556 |
0.9291 | 61.93 | 418 | 0.9770 | 0.5463 |
0.929 | 62.96 | 425 | 0.9385 | 0.5556 |
0.9448 | 64.0 | 432 | 0.9504 | 0.5463 |
0.9448 | 64.89 | 438 | 0.9984 | 0.5463 |
0.9426 | 65.93 | 445 | 0.9228 | 0.5602 |
0.8949 | 66.96 | 452 | 0.9729 | 0.5509 |
0.8949 | 68.0 | 459 | 0.9825 | 0.5602 |
0.9041 | 68.89 | 465 | 0.9769 | 0.5509 |
0.8828 | 69.93 | 472 | 0.9914 | 0.5648 |
0.8828 | 70.96 | 479 | 0.9838 | 0.5509 |
0.8874 | 72.0 | 486 | 0.9646 | 0.5741 |
0.8723 | 72.89 | 492 | 1.0682 | 0.5324 |
0.8723 | 73.93 | 499 | 1.0629 | 0.5417 |
0.8953 | 74.96 | 506 | 0.9770 | 0.5648 |
0.879 | 76.0 | 513 | 1.0038 | 0.5787 |
0.879 | 76.89 | 519 | 1.0529 | 0.5648 |
0.896 | 77.93 | 526 | 1.0300 | 0.5602 |
0.8519 | 78.96 | 533 | 1.0451 | 0.5463 |
0.8414 | 80.0 | 540 | 1.0755 | 0.5509 |
0.8414 | 80.89 | 546 | 1.0287 | 0.5556 |
0.8342 | 81.93 | 553 | 1.0140 | 0.5602 |
0.8653 | 82.96 | 560 | 1.0787 | 0.5463 |
0.8653 | 84.0 | 567 | 1.0762 | 0.5509 |
0.8357 | 84.89 | 573 | 1.0307 | 0.5741 |
0.8455 | 85.93 | 580 | 1.0171 | 0.5648 |
0.8455 | 86.96 | 587 | 0.9886 | 0.5880 |
0.8238 | 88.0 | 594 | 0.9806 | 0.5741 |
0.8613 | 88.89 | 600 | 1.0177 | 0.5833 |
0.8613 | 89.93 | 607 | 1.0273 | 0.5602 |
0.8265 | 90.96 | 614 | 0.9857 | 0.5926 |
0.831 | 92.0 | 621 | 0.9701 | 0.5972 |
0.831 | 92.89 | 627 | 0.9726 | 0.5972 |
0.8247 | 93.93 | 634 | 0.9765 | 0.5880 |
0.8041 | 94.96 | 641 | 0.9801 | 0.5926 |
0.8041 | 96.0 | 648 | 0.9796 | 0.5926 |
0.8387 | 96.89 | 654 | 0.9790 | 0.5972 |
0.7906 | 97.78 | 660 | 0.9783 | 0.5972 |
Framework versions
- Transformers 4.31.0
- Pytorch 2.0.1+cu117
- Datasets 2.14.4
- Tokenizers 0.13.3
đ License
This model is licensed under [CC - BY - NC 4.0](https://creativecommons.org/licenses/by - nc/4.0/).
Voice Activity Detection
MIT
Voice activity detection model based on pyannote.audio 2.1, used to identify speech activity segments in audio
Speech Recognition
V
pyannote
7.7M
181
Wav2vec2 Large Xlsr 53 Portuguese
Apache-2.0
This is a fine-tuned XLSR-53 large model for Portuguese speech recognition tasks, trained on the Common Voice 6.1 dataset, supporting Portuguese speech-to-text conversion.
Speech Recognition Other
W
jonatasgrosman
4.9M
32
Whisper Large V3
Apache-2.0
Whisper is an advanced automatic speech recognition (ASR) and speech translation model proposed by OpenAI, trained on over 5 million hours of labeled data, with strong cross-dataset and cross-domain generalization capabilities.
Speech Recognition Supports Multiple Languages
W
openai
4.6M
4,321
Whisper Large V3 Turbo
MIT
Whisper is a state-of-the-art automatic speech recognition (ASR) and speech translation model developed by OpenAI, trained on over 5 million hours of labeled data, demonstrating strong generalization capabilities in zero-shot settings.
Speech Recognition
Transformers Supports Multiple Languages

W
openai
4.0M
2,317
Wav2vec2 Large Xlsr 53 Russian
Apache-2.0
A Russian speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampled audio input
Speech Recognition Other
W
jonatasgrosman
3.9M
54
Wav2vec2 Large Xlsr 53 Chinese Zh Cn
Apache-2.0
A Chinese speech recognition model fine-tuned based on facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampling rate audio input.
Speech Recognition Chinese
W
jonatasgrosman
3.8M
110
Wav2vec2 Large Xlsr 53 Dutch
Apache-2.0
A Dutch speech recognition model fine-tuned based on facebook/wav2vec2-large-xlsr-53, trained on the Common Voice and CSS10 datasets, supporting 16kHz audio input.
Speech Recognition Other
W
jonatasgrosman
3.0M
12
Wav2vec2 Large Xlsr 53 Japanese
Apache-2.0
Japanese speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampling rate audio input
Speech Recognition Japanese
W
jonatasgrosman
2.9M
33
Mms 300m 1130 Forced Aligner
A text-to-audio forced alignment tool based on Hugging Face pre-trained models, supporting multiple languages with high memory efficiency
Speech Recognition
Transformers Supports Multiple Languages

M
MahmoudAshraf
2.5M
50
Wav2vec2 Large Xlsr 53 Arabic
Apache-2.0
Arabic speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, trained on Common Voice and Arabic speech corpus
Speech Recognition Arabic
W
jonatasgrosman
2.3M
37
Featured Recommended AI Models
Š 2025AIbase