đ Wav2Vec2 XLS-R Adult/Child Indonesian Speech Classifier
An audio classification model based on the XLS-R architecture for classifying adult and child Indonesian speech.
This model is an audio classification model based on the XLS-R architecture. It is a fine - tuned version of wav2vec2 - xls - r - 300m on a private adult/child Indonesian speech classification dataset. It was trained using HuggingFace's PyTorch framework on a Tesla P100 provided by Kaggle, and training metrics were logged via Tensorboard.
⨠Features
- Based on the advanced XLS - R architecture for effective audio classification.
- Fine - tuned on a private Indonesian speech dataset for adult/child classification.
- Trained using HuggingFace's PyTorch framework with detailed training metrics logged.
đĻ Model Details
Property |
Details |
Model Type |
wav2vec2-xls-r-adult-child-id-cls |
#params |
300M |
Architecture |
XLS - R |
Training Data |
Adult/Child Indonesian Speech Classification Dataset |
đ Evaluation Results
The model achieves the following results on evaluation:
Dataset |
Loss |
Accuracy |
F1 |
Adult/Child Indonesian Speech Classification |
0.1970 |
93.38% |
0.9307 |
đ§ Training procedure
Training hyperparameters
The following hyperparameters were used during training:
learning_rate
: 3e - 05
train_batch_size
: 8
eval_batch_size
: 8
seed
: 42
gradient_accumulation_steps
: 4
total_train_batch_size
: 32
optimizer
: Adam with betas=(0.9,0.999)
and epsilon = 1e - 08
lr_scheduler_type
: linear
lr_scheduler_warmup_ratio
: 0.1
num_epochs
: 4
Training results
Training Loss |
Epoch |
Step |
Validation Loss |
Accuracy |
F1 |
0.336 |
1.0 |
305 |
0.3146 |
0.8845 |
0.8698 |
0.2345 |
2.0 |
610 |
0.2140 |
0.9251 |
0.9202 |
0.3215 |
3.0 |
915 |
0.2038 |
0.9315 |
0.9286 |
0.2059 |
4.0 |
1220 |
0.1970 |
0.9338 |
0.9307 |
â ī¸ Disclaimer
Do consider the biases which came from pre - training datasets that may be carried over into the results of this model.
đĨ Authors
Wav2Vec2 XLS - R Adult/Child Indonesian Speech Classifier was trained and evaluated by Ananto Joyoadikusumo. All computation and development are done on Kaggle.
đ ī¸ Framework versions
- Transformers 4.18.0
- Pytorch 1.11.0+cu102
- Datasets 2.2.0
- Tokenizers 0.12.1
đ License
This project is licensed under the Apache - 2.0 license.