Wav2vec2-xls-r-300m-zh-CN Open Source Speech Recognition Model - Free Deployment for Precise Recognition of Mandarin Chinese

Wav2vec2 Xls R 300m Zh CN

Developed by anantoj

This model is an automatic speech recognition (ASR) model fine-tuned on the general speech dataset ZH-CN based on facebook/wav2vec2-xls-r-300m, supporting Mandarin Chinese recognition.

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Chinese speech-to-text #High CER robustness #General speech dataset

Downloads 37

Release Time : 3/2/2022

Model Overview

This is an automatic speech recognition model optimized for Mandarin Chinese, fine-tuned on a general speech dataset, capable of converting speech to text.

Model Features

Chinese optimization

Specifically fine-tuned for Mandarin Chinese, performing well on Chinese speech recognition tasks

Based on large model

Built on the 300M-parameter wav2vec2-xls-r large model with strong speech feature extraction capabilities

General speech dataset

Trained using the Common Voice dataset, demonstrating good generalization ability

Model Capabilities

Chinese speech recognition

Speech-to-text

Automatic speech transcription

Use Cases

Speech transcription

Meeting minutes

Automatically convert meeting recordings into text records

CER (Character Error Rate) approximately 20.59%

Voice input

Provide voice input functionality for applications

Accessibility technology

Real-time captions

Provide real-time speech-to-text services for hearing-impaired individuals

🚀 Speech Recognition Model

This is a fine - tuned speech recognition model that addresses the issue of accurately transcribing Chinese speech. It offers high - quality speech - to - text conversion, providing reliable results for various speech recognition tasks.

🚀 Quick Start

This model is a fine - tuned version of [facebook/wav2vec2 - xls - r - 300m](https://huggingface.co/facebook/wav2vec2 - xls - r - 300m) on the COMMON_VOICE - ZH - CN dataset. It achieves the following results on the evaluation set:

Loss: 0.8122
Wer: 0.8392
Cer: 0.2059

📚 Documentation

Model Details

Property	Details
Model Type	Fine - tuned version of [facebook/wav2vec2 - xls - r - 300m](https://huggingface.co/facebook/wav2vec2 - xls - r - 300m) on the COMMON_VOICE - ZH - CN dataset
Training Data	COMMON_VOICE - ZH - CN dataset

Evaluation Results

The model has been evaluated on the following datasets:

Robust Speech Event - Dev Data:
- Task: Automatic Speech Recognition
- Metrics: Test CER = 66.22
Robust Speech Event - Test Data:
- Task: Automatic Speech Recognition
- Metrics: Test CER = 37.51

🔧 Technical Details

Training Hyperparameters

The following hyperparameters were used during training:

learning_rate: 7.5e - 05
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 2000
num_epochs: 100.0
mixed_precision_training: Native AMP

Training Results

Training Loss	Epoch	Step	Validation Loss	Wer	Cer
69.215	0.74	500	74.9751	1.0	1.0
8.2109	1.48	1000	7.0617	1.0	1.0
6.4277	2.22	1500	6.3811	1.0	1.0
6.3513	2.95	2000	6.3061	1.0	1.0
6.2522	3.69	2500	6.2147	1.0	1.0
5.9757	4.43	3000	5.7906	1.1004	0.9924
5.0642	5.17	3500	4.2984	1.7729	0.8214
4.6346	5.91	4000	3.7129	1.8946	0.7728
4.267	6.65	4500	3.2177	1.7526	0.6922
3.9964	7.39	5000	2.8337	1.8055	0.6546
3.8035	8.12	5500	2.5726	2.1851	0.6992
3.6273	8.86	6000	2.3391	2.1029	0.6511
3.5248	9.6	6500	2.1944	2.3617	0.6859
3.3683	10.34	7000	1.9827	2.1014	0.6063
3.2411	11.08	7500	1.8610	1.6160	0.5135
3.1299	11.82	8000	1.7446	1.5948	0.4946
3.0574	12.56	8500	1.6454	1.1291	0.4051
2.985	13.29	9000	1.5919	1.0673	0.3893
2.9573	14.03	9500	1.4903	1.0604	0.3766
2.8897	14.77	10000	1.4614	1.0059	0.3653
2.8169	15.51	10500	1.3997	1.0030	0.3550
2.8155	16.25	11000	1.3444	0.9980	0.3441
2.7595	16.99	11500	1.2911	0.9703	0.3325
2.7107	17.72	12000	1.2462	0.9565	0.3227
2.6358	18.46	12500	1.2466	0.9955	0.3333
2.5801	19.2	13000	1.2059	1.0010	0.3226
2.5554	19.94	13500	1.1919	1.0094	0.3223
2.5314	20.68	14000	1.1703	0.9847	0.3156
2.509	21.42	14500	1.1733	0.9896	0.3177
2.4391	22.16	15000	1.1811	0.9723	0.3164
2.4631	22.89	15500	1.1382	0.9698	0.3059
2.4414	23.63	16000	1.0893	0.9644	0.2972
2.3771	24.37	16500	1.0930	0.9505	0.2954
2.3658	25.11	17000	1.0756	0.9609	0.2926
2.3215	25.85	17500	1.0512	0.9614	0.2890
2.3327	26.59	18000	1.0627	1.1984	0.3282
2.3055	27.33	18500	1.0582	0.9520	0.2841
2.299	28.06	19000	1.0356	0.9480	0.2817
2.2673	28.8	19500	1.0305	0.9367	0.2771
2.2166	29.54	20000	1.0139	0.9223	0.2702
2.2378	30.28	20500	1.0095	0.9268	0.2722
2.2168	31.02	21000	1.0001	0.9085	0.2691
2.1766	31.76	21500	0.9884	0.9050	0.2640
2.1715	32.5	22000	0.9730	0.9505	0.2719
2.1104	33.23	22500	0.9752	0.9362	0.2656
2.1158	33.97	23000	0.9720	0.9263	0.2624
2.0718	34.71	23500	0.9573	1.0005	0.2759
2.0824	35.45	24000	0.9609	0.9525	0.2643
2.0591	36.19	24500	0.9662	0.9570	0.2667
2.0768	36.93	25000	0.9528	0.9574	0.2646
2.0893	37.67	25500	0.9810	0.9169	0.2612
2.0282	38.4	26000	0.9556	0.8877	0.2528
1.997	39.14	26500	0.9523	0.8723	0.2501
2.0209	39.88	27000	0.9542	0.8773	0.2503
1.987	40.62	27500	0.9427	0.8867	0.2500
1.9663	41.36	28000	0.9546	0.9065	0.2546
1.9945	42.1	28500	0.9431	0.9119	0.2536
1.9604	42.84	29000	0.9367	0.9030	0.2490
1.933	43.57	29500	0.9071	0.8916	0.2432
1.9227	44.31	30000	0.9048	0.8882	0.2428
1.8784	45.05	30500	0.9106	0.8991	0.2437
1.8844	45.79	31000	0.8996	0.8758	0.2379
1.8776	46.53	31500	0.9028	0.8798	0.2395
1.8372	47.27	32000	0.9047	0.8778	0.2379
1.832	48.01	32500	0.9016	0.8941	0.2393
1.8154	48.74	33000	0.8915	0.8916	0.2372
1.8072	49.48	33500	0.8781	0.8872	0.2365
1.7489	50.22	34000	0.8738	0.8956	0.2340
1.7928	50.96	34500	0.8684	0.8872	0.2323
1.7748	51.7	35000	0.8723	0.8718	0.2321
1.7355	52.44	35500	0.8760	0.8842	0.2331
1.7167	53.18	36000	0.8746	0.8817	0.2324
1.7479	53.91	36500	0.8762	0.8753	0.2281
1.7428	54.65	37000	0.8733	0.8699	0.2277
1.7058	55.39	37500	0.8816	0.8649	0.2263
1.7045	56.13	38000	0.8733	0.8689	0.2297
1.709	56.87	38500	0.8648	0.8654	0.2232
1.6799	57.61	39000	0.8717	0.8580	0.2244
1.664	58.35	39500	0.8653	0.8723	0.2259
1.6488	59.08	40000	0.8637	0.8803	0.2271
1.6298	59.82	40500	0.8553	0.8768	0.2253
1.6185	60.56	41000	0.8512	0.8718	0.2240
1.574	61.3	41500	0.8579	0.8773	0.2251
1.6192	62.04	42000	0.8499	0.8743	0.2242
1.6275	62.78	42500	0.8419	0.8758	0.2216
1.5697	63.52	43000	0.8446	0.8699	0.2222
1.5384	64.25	43500	0.8462	0.8580	0.2200
1.5115	64.99	44000	0.8467	0.8674	0.2214
1.5547	65.73	44500	0.8505	0.8669	0.2204
1.5597	66.47	45000	0.8421	0.8684	0.2192
1.505	67.21	45500	0.8485	0.8619	0.2187
1.5101	67.95	46000	0.8489	0.8649	0.2204
1.5199	68.69	46500	0.8407	0.8619	0.2180
1.5207	69.42	47000	0.8379	0.8496	0.2163
1.478	70.16	47500	0.8357	0.8595	0.2163
1.4817	70.9	48000	0.8346	0.8496	0.2151
1.4827	71.64	48500	0.8362	0.8624	0.2169
1.4513	72.38	49000	0.8355	0.8451	0.2137
1.4988	73.12	49500	0.8325	0.8624	0.2161
1.4267	73.85	50000	0.8396	0.8481	0.2157
1.4421	74.59	50500	0.8355	0.8491	0.2122
1.4311	75.33	51000	0.8358	0.8476	0.2118
1.4174	76.07	51500	0.8289	0.8451	0.2101
1.4349	76.81	52000	0.8372	0.8580	0.2140
1.3959	77.55	52500	0.8325	0.8436	0.2116
1.4087	78.29	53000	0.8351	0.8446	0.2105
1.415	79.03	53500	0.8363	0.8476	0.2123
1.4122	79.76	54000	0.8310	0.8481	0.2112
1.3969	80.5	54500	0.8239	0.8446	0.2095
1.361	81.24	55000	0.8282	0.8427	0.2091
1.3611	81.98	55500	0.8282	0.8407	0.2092
1.3677	82.72	56000	0.8235	0.8436	0.2084
1.3361	83.46	56500	0.8231	0.8377	0.2069
1.3779	84.19	57000	0.8206	0.8436	0.2070
1.3727	84.93	57500	0.8204	0.8392	0.2065
1.3317	85.67	58000	0.8207	0.8436	0.2065
1.3332	86.41	58500	0.8186	0.8357	0.2055
1.3299	87.15	59000	0.8193	0.8417	0.2075
1.3129	87.89	59500	0.8183	0.8431	0.2065
1.3352	88.63	60000	0.8151	0.8471	0.2062
1.3026	89.36	60500	0.8125	0.8486	0.2067
1.3468	90.1	61000	0.8124	0.8407	0.2058
1.3028	90.84	61500	0.8122	0.8461	0.2051
1.2884	91.58	62000	0.8086	0.8427	0.2048
1.3005	92.32	62500	0.8110	0.8387	0.2055
1.2996	93.06	63000	0.8126	0.8328	0.2057
1.2707	93.8	63500	0.8098	0.8402	0.2047
1.3026	94.53	64000	0.8097	0.8402	0.2050
1.2546	95.27	64500	0.8111	0.8402	0.2055
1.2426	96.01	65000	0.8088	0.8372	0.2059
1.2869	96.75	65500	0.8093	0.8397	0.2048
1.2782	97.49	66000	0.8099	0.8412	0.2049
1.2457	98.23	66500	0.8134	0.8412	0.2062
1.2967	98.97	67000	0.8115	0.8382	0.2055
1.2817	99.7	67500	0.8128	0.8392	0.2063

Framework Versions

Transformers 4.17.0.dev0
Pytorch 1.10.2+cu102
Datasets 1.18.3.dev0
Tokenizers 0.11.0

📄 License

This project is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご