AI-Light-Dance_Singing2_FT Open-Source Model - Achieve Accurate Singing Voice Recognition for Free

Ai Light Dance Singing2 Ft Wav2vec2 Large Xlsr 53 V1

Developed by gary109

This model is an automatic speech recognition model fine-tuned on the GARY109/AI_LIGHT_DANCE - ONSET-SINGING2 dataset based on wav2vec2-large-xlsr-53, primarily used for singing voice recognition tasks.

Speech Recognition

Transformers

Open Source License:Apache-2.0 #Singing voice recognition #Low word error rate #XLSR-53 fine-tuning

Downloads 185

Release Time : 6/24/2022

Model Overview

This is an automatic speech recognition model optimized for singing voice recognition tasks, fine-tuned on the wav2vec2-large-xlsr-53 architecture, demonstrating excellent performance on specific datasets.

Model Features

Singing Voice Optimization

Specially fine-tuned for singing voice, outperforming general speech recognition models in singing scenarios.

Efficient Training

Utilizes techniques like gradient accumulation to achieve effective training with relatively small batch sizes.

Stable Performance

Validation loss and word error rate consistently decrease during training, demonstrating good convergence.

Model Capabilities

Singing voice recognition

Speech to text

Audio content analysis

Use Cases

Music Technology

Singing Voice to Lyrics

Automatically convert singing recordings into text lyrics

Word error rate approximately 29.05%

Music Content Analysis

Analyze lyric content in singing recordings

🚀 ai-light-dance_singing2_ft_wav2vec2-large-xlsr-53-v1

This is a fine - tuned speech recognition model that achieves good results on the GARY109/AI_LIGHT_DANCE - ONSET - SINGING2 dataset.

🚀 Quick Start

This model is a fine - tuned version of [gary109/ai - light - dance_singing2_ft_wav2vec2 - large - xlsr - 53](https://huggingface.co/gary109/ai - light - dance_singing2_ft_wav2vec2 - large - xlsr - 53) on the GARY109/AI_LIGHT_DANCE - ONSET - SINGING2 dataset. It achieves the following results on the evaluation set:

Loss: 0.5760
Wer: 0.2905

📚 Documentation

🔧 Technical Details

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 4e - 05
train_batch_size: 10
eval_batch_size: 10
seed: 42
gradient_accumulation_steps: 16
total_train_batch_size: 160
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 100
num_epochs: 40.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
1.656	1.0	112	1.7625	0.9265
1.3693	2.0	224	1.5135	0.9243
1.2172	3.0	336	1.2657	0.8533
1.0456	4.0	448	1.0893	0.7691
0.9385	5.0	560	1.0110	0.7097
0.8165	6.0	672	0.9243	0.6682
0.7491	7.0	784	0.8948	0.6583
0.6772	8.0	896	0.7894	0.6007
0.6096	9.0	1008	0.7684	0.5663
0.5714	10.0	1120	0.6978	0.4826
0.5213	11.0	1232	0.8433	0.4927
0.4624	12.0	1344	0.6695	0.4469
0.4298	13.0	1456	0.6569	0.3868
0.3939	14.0	1568	0.6633	0.3694
0.3803	15.0	1680	0.6376	0.3920
0.3415	16.0	1792	0.6463	0.3414
0.3239	17.0	1904	0.5841	0.3197
0.2946	18.0	2016	0.5948	0.3112
0.2751	19.0	2128	0.5760	0.2905
0.2834	20.0	2240	0.5884	0.2975
0.2383	21.0	2352	0.5989	0.2775
0.2265	22.0	2464	0.6151	0.2853
0.2158	23.0	2576	0.5843	0.2670
0.2015	24.0	2688	0.6621	0.2738
0.215	25.0	2800	0.6068	0.2652
0.1859	26.0	2912	0.6136	0.2570
0.1745	27.0	3024	0.6191	0.2624
0.1611	28.0	3136	0.6364	0.2578
0.1513	29.0	3248	0.6402	0.2535
0.172	30.0	3360	0.6330	0.2500
0.1488	31.0	3472	0.6275	0.2521
0.1371	32.0	3584	0.6539	0.2540
0.1356	33.0	3696	0.6544	0.2491
0.1319	34.0	3808	0.6545	0.2491
0.1465	35.0	3920	0.6573	0.2495
0.13	36.0	4032	0.6594	0.2494
0.1244	37.0	4144	0.6651	0.2476
0.1228	38.0	4256	0.6754	0.2497
0.1181	39.0	4368	0.6684	0.2468
0.1338	40.0	4480	0.6713	0.2471

Framework versions

Transformers 4.21.0.dev0
Pytorch 1.9.1+cu102
Datasets 2.3.3.dev0
Tokenizers 0.12.1

📄 License

The model is licensed under the Apache 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご