Home

Wav2vec2 Phoneme

Developed by Bluecast

A speech recognition model fine-tuned based on facebook/wav2vec2-large-xlsr-53, focusing on phoneme recognition tasks

Speech Recognition

Transformers

Open Source License:Apache-2.0 #Speech Recognition #Low Word Error Rate #Multilingual Support

Downloads 189

Release Time : 4/24/2024

Model Overview

This model is a fine-tuned version of facebook/wav2vec2-large-xlsr-53 on an unknown dataset, primarily used for speech recognition tasks with special emphasis on phoneme-level recognition.

Model Features

Efficient Phoneme Recognition

Optimized for phoneme recognition tasks, achieving a 12.81% word error rate on the validation set

Based on Large-scale Pre-trained Model

Fine-tuned from the facebook/wav2vec2-large-xlsr-53 model, inheriting its powerful speech feature extraction capabilities

Lightweight Fine-tuning

Completed fine-tuning with relatively small training batches and moderate training epochs, resulting in low resource consumption

Model Capabilities

Speech Recognition

Phoneme Level Analysis

Audio Feature Extraction

Use Cases

Speech Processing

Speech Transcription

Convert speech content into text format

Word Error Rate 12.81%

Phoneme Analysis

Identify phoneme components in speech

Educational Technology

Pronunciation Assessment

Used for evaluating pronunciation accuracy in language learning

license: apache-2.0 base_model: facebook/wav2vec2-large-xlsr-53 tags:

generated_from_trainer metrics:
wer model-index:
name: wav2vec2-Phoneme results: []

wav2vec2-Phoneme

This model is a fine-tuned version of facebook/wav2vec2-large-xlsr-53 on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.2842
Wer: 0.1281

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 16
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 30
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
2.1769	0.2954	100	2.1463	0.9564
2.1285	0.5908	200	2.0959	0.9575
1.8989	0.8863	300	1.5997	0.9022
1.1123	1.1817	400	0.6782	0.4093
0.618	1.4771	500	0.3548	0.1544
0.4993	1.7725	600	0.3039	0.1331
0.4425	2.0679	700	0.2688	0.1169
0.363	2.3634	800	0.2419	0.1108
0.3507	2.6588	900	0.2220	0.1039
0.3282	2.9542	1000	0.1999	0.1001
0.2887	3.2496	1100	0.2044	0.0974
0.3104	3.5451	1200	0.1950	0.0994
0.2976	3.8405	1300	0.2005	0.0969
0.2617	4.1359	1400	0.1907	0.0962
0.2783	4.4313	1500	0.1886	0.0936
0.2533	4.7267	1600	0.1845	0.0938
0.2501	5.0222	1700	0.1759	0.0926
0.2261	5.3176	1800	0.1789	0.0896
0.2112	5.6130	1900	0.1824	0.0891
0.2162	5.9084	2000	0.1715	0.0886
0.2098	6.2038	2100	0.1761	0.0902
0.2133	6.4993	2200	0.1747	0.0896
0.2174	6.7947	2300	0.1753	0.0892
0.2033	7.0901	2400	0.1729	0.0886
0.2167	7.3855	2500	0.1749	0.0889
0.2001	7.6809	2600	0.1650	0.0874
0.1874	7.9764	2700	0.1656	0.0872
0.1846	8.2718	2800	0.1674	0.0873
0.1927	8.5672	2900	0.1595	0.0863
0.1672	8.8626	3000	0.1552	0.0849
0.1741	9.1581	3100	0.1659	0.0868
0.1753	9.4535	3200	0.1615	0.0862
0.1825	9.7489	3300	0.1623	0.0862
0.166	10.0443	3400	0.1584	0.0865
0.1762	10.3397	3500	0.1573	0.0850
0.1744	10.6352	3600	0.1537	0.0863
0.1786	10.9306	3700	0.1522	0.0840
0.1731	11.2260	3800	0.1645	0.0851
0.1929	11.5214	3900	0.1785	0.0851
0.2047	11.8168	4000	0.1844	0.0860
0.255	12.1123	4100	0.2305	0.0911
0.2771	12.4077	4200	0.2311	0.0886
0.2742	12.7031	4300	0.2605	0.0901
0.3879	12.9985	4400	0.2886	0.0965
0.3655	13.2939	4500	0.2897	0.0933
0.3693	13.5894	4600	0.2936	0.0960
0.3999	13.8848	4700	0.2905	0.1059
0.4286	14.1802	4800	0.3424	0.1025
0.574	14.4756	4900	0.3891	0.1135
0.5753	14.7710	5000	0.3912	0.1276
0.5225	15.0665	5100	0.4248	0.1151
0.4785	15.3619	5200	0.3332	0.1287
0.5733	15.6573	5300	0.3999	0.1261
0.5471	15.9527	5400	0.4144	0.1293
0.5527	16.2482	5500	0.3580	0.1160
0.6322	16.5436	5600	0.5158	0.1794
0.6867	16.8390	5700	0.4731	0.1411
0.606	17.1344	5800	0.3812	0.1305
0.5376	17.4298	5900	0.3505	0.1206
0.5035	17.7253	6000	0.3251	0.1199
0.469	18.0207	6100	0.3092	0.1172
0.4544	18.3161	6200	0.3030	0.1185
0.4288	18.6115	6300	0.2915	0.1183
0.4457	18.9069	6400	0.2834	0.1203
0.408	19.2024	6500	0.2765	0.1212
0.4182	19.4978	6600	0.2741	0.1205
0.4117	19.7932	6700	0.2705	0.1209
0.4131	20.0886	6800	0.2725	0.1230
0.4034	20.3840	6900	0.2713	0.1218
0.4048	20.6795	7000	0.2707	0.1226
0.4199	20.9749	7100	0.2695	0.1221
0.4286	21.2703	7200	0.2709	0.1239
0.3968	21.5657	7300	0.2699	0.1230
0.4071	21.8612	7400	0.2705	0.1254
0.4178	22.1566	7500	0.2701	0.1252
0.396	22.4520	7600	0.2702	0.1252
0.4255	22.7474	7700	0.2701	0.1249
0.4239	23.0428	7800	0.2716	0.1254
0.4153	23.3383	7900	0.2729	0.1264
0.4265	23.6337	8000	0.2726	0.1264
0.4221	23.9291	8100	0.2737	0.1266
0.4268	24.2245	8200	0.2751	0.1269
0.4207	24.5199	8300	0.2761	0.1273
0.3872	24.8154	8400	0.2764	0.1273
0.4004	25.1108	8500	0.2786	0.1276
0.4096	25.4062	8600	0.2798	0.1276
0.4542	25.7016	8700	0.2803	0.1274
0.4361	25.9970	8800	0.2818	0.1276
0.4454	26.2925	8900	0.2826	0.1277
0.4204	26.5879	9000	0.2842	0.1281
0.4423	26.8833	9100	0.2841	0.1280
0.4333	27.1787	9200	0.2845	0.1282
0.4036	27.4742	9300	0.2844	0.1281
0.4203	27.7696	9400	0.2844	0.1281
0.4321	28.0650	9500	0.2842	0.1281
0.4251	28.3604	9600	0.2842	0.1281
0.4122	28.6558	9700	0.2841	0.1281
0.424	28.9513	9800	0.2841	0.1280
0.4404	29.2467	9900	0.2842	0.1281
0.4174	29.5421	10000	0.2842	0.1281
0.4432	29.8375	10100	0.2842	0.1281

Framework versions

Transformers 4.40.0
Pytorch 2.2.1+cu121
Datasets 2.19.1.dev0
Tokenizers 0.19.1

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご