ai-light-dance_stepmania_ft_wav2vec2-large-xlsr-53-v7 Open-source Model

Ai Light Dance Stepmania Ft Wav2vec2 Large Xlsr 53 V7

Developed by gary109

An automatic speech recognition model based on wav2vec2-large-xlsr-53, specifically optimized for StepMania game audio, fine-tuned on the GARY109/AI_LIGHT_DANCE dataset

Speech Recognition

Transformers

Open Source License:Apache-2.0 #Dance rhythm recognition #High-precision audio analysis #Music game adaptation

Downloads 162

Release Time : 6/30/2022

Model Overview

This model is an automatic speech recognition (ASR) model optimized for StepMania game audio, achieved by fine-tuning the wav2vec2-large-xlsr-53 architecture, demonstrating excellent performance on specific game audio datasets

Model Features

Game audio optimization

Specifically optimized for StepMania game audio data, delivering better recognition performance

Fine-tuned version

Fine-tuned based on the wav2vec2-large-xlsr-53 model, retaining the powerful feature extraction capabilities of the original model

Low word error rate

Achieves a word error rate (WER) of 0.6512 on the evaluation set, demonstrating excellent performance

Model Capabilities

Game audio recognition

Speech-to-text

Rhythm game audio analysis

Use Cases

Game development

StepMania game audio analysis

Used to analyze the audio rhythm and content in StepMania games

Word error rate 0.6512

Speech recognition

Domain-specific speech recognition

Suitable for speech recognition tasks in specific domains such as game audio

🚀 ai-light-dance_stepmania_ft_wav2vec2-large-xlsr-53-v7

This model is a fine - tuned version of gary109/ai-light-dance_stepmania_ft_wav2vec2-large-xlsr-53-v6 on the GARY109/AI_LIGHT_DANCE - ONSET-STEPMANIA2 dataset. It offers significant improvements in automatic speech recognition tasks. On the evaluation set, it achieves the following results:

Loss: 1.0424
Wer: 0.6512

📚 Documentation

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

🔧 Technical Details

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 4e-06
train_batch_size: 2
eval_batch_size: 2
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 100
num_epochs: 30.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
0.9303	1.0	12031	1.1160	0.6712
0.8181	2.0	24062	1.0601	0.6608
0.7861	3.0	36093	1.0478	0.6520
0.767	4.0	48124	1.0617	0.6526
0.797	5.0	60155	1.0424	0.6512
0.834	6.0	72186	1.0519	0.6542
0.7915	7.0	84217	1.0508	0.6494
0.8106	8.0	96248	1.0753	0.6449
0.7512	9.0	108279	1.1223	0.6592
0.777	10.0	120310	1.1201	0.6535
0.7631	11.0	132341	1.0780	0.6512
0.7465	12.0	144372	1.0822	0.6499
0.826	13.0	156403	1.0706	0.6445
0.7552	14.0	168434	1.0862	0.6449
0.8279	15.0	180465	1.1162	0.6461
0.7769	16.0	192496	1.1023	0.6420
0.7918	17.0	204527	1.1085	0.6456
0.6941	18.0	216558	1.1139	0.6417
0.7379	19.0	228589	1.1126	0.6410
0.7467	20.0	240620	1.1102	0.6369
0.8045	21.0	252651	1.1191	0.6376
0.7059	22.0	264682	1.1285	0.6381
0.7008	23.0	276713	1.1328	0.6377
0.7816	24.0	288744	1.1326	0.6366
0.7426	25.0	300775	1.1420	0.6362
0.7226	26.0	312806	1.1326	0.6350
0.665	27.0	324837	1.1419	0.6346
0.7184	28.0	336868	1.1480	0.6346
0.77	29.0	348899	1.1476	0.6343
0.727	30.0	360930	1.1494	0.6348

Framework versions

Transformers 4.21.0.dev0
Pytorch 1.9.1+cu102
Datasets 2.3.3.dev0
Tokenizers 0.12.1

📄 License

This project is licensed under the Apache-2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご