wav2vec2-xls-r-300m-korean-lm Open Source Model - Enables Precise Automatic Korean Speech Recognition

Wav2vec2 Xls R 300m Korean Lm

Developed by w11wo

Korean automatic speech recognition model based on XLS-R architecture, fine-tuned on the Zeroth Korean dataset with an added 5-gram language model

Speech Recognition

Transformers

KoreanOpen Source License:Apache-2.0 #Korean Speech Recognition #XLS-R Architecture #Low CER Optimization

Downloads 23

Release Time : 3/2/2022

Model Overview

This model is a deep learning model for Korean automatic speech recognition (ASR), fine-tuned based on Facebook's Wav2Vec2-XLS-R-300M architecture, suitable for Korean speech-to-text tasks.

Model Features

Korean Optimization

Specially fine-tuned for Korean speech recognition, performing well on the Zeroth Korean dataset

5-gram Language Model Enhancement

Incorporated a 5-gram language model trained on the Open Subtitles Korean subset to improve recognition accuracy

Robustness Testing

Participated in HuggingFace's Robust Speech Challenge, testing performance under various conditions

Model Capabilities

Korean Speech Recognition

Speech-to-Text

Supports 5-gram Language Model Decoding

Use Cases

Speech Transcription

Korean Speech Transcription

Convert Korean speech content into text

Achieved 30.94% WER and 7.97% CER on the Zeroth Korean dataset

Voice Assistants

Korean Voice Command Recognition

Recognize and understand Korean voice commands

Achieved 66.47% WER on robust speech event test data

🚀 Wav2Vec2 XLS-R 300M Korean LM

Wav2Vec2 XLS-R 300M Korean LM is an automatic speech recognition model. It addresses the challenge of accurately transcribing Korean speech. Based on the XLS - R architecture, it offers high - precision speech recognition capabilities, providing value in various Korean language speech - related applications.

🚀 Quick Start

This README provides detailed information about the Wav2Vec2 XLS - R 300M Korean LM model, including its architecture, training process, evaluation results, and more.

✨ Features

Based on XLS - R Architecture: Leveraging the advanced XLS - R architecture for effective speech feature extraction.
Fine - Tuned on Zeroth Korean Dataset: Optimized for the Korean language using the Zeroth Korean dataset.
Incorporated 5 - gram Language Model: Enhances speech recognition accuracy by integrating a 5 - gram language model trained on the Korean subset of Open Subtitles.

📦 Installation

No specific installation steps are provided in the original README.

💻 Usage Examples

No code examples are provided in the original README.

📚 Documentation

Model

Property	Details
Model Type	`wav2vec2-xls-r-300m-korean-lm`
Training Data	`Zeroth Korean` Dataset
#params	300M
Arch.	XLS - R

Evaluation Results

Without Language Model

Dataset	WER	CER
`Zeroth Korean`	29.54%	9.53%
`Robust Speech Event - Dev Data`	76.26%	38.67%

With Language Model

Dataset	WER	CER
`Zeroth Korean`	30.94%	7.97%
`Robust Speech Event - Dev Data`	68.34%	37.08%

Training procedure

The training process did not involve the addition of a language model. The following results were simply lifted from the original automatic speech recognition model training.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 7.5e - 05
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Adam with betas=(0.9, 0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 2000
num_epochs: 50.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer	Cer
19.7138	0.72	500	19.6427	1.0	1.0
4.8039	1.44	1000	4.7842	1.0	1.0
4.5619	2.16	1500	4.5608	0.9992	0.9598
4.254	2.88	2000	4.2729	0.9955	0.9063
4.1905	3.6	2500	4.2257	0.9903	0.8758
4.0683	4.32	3000	3.9294	0.9937	0.7911
3.486	5.04	3500	2.7045	1.0012	0.5934
2.946	5.75	4000	1.9691	0.9425	0.4634
2.634	6.47	4500	1.5212	0.8807	0.3850
2.4066	7.19	5000	1.2551	0.8177	0.3601
2.2651	7.91	5500	1.0423	0.7650	0.3039
2.1828	8.63	6000	0.9599	0.7273	0.3106
2.1023	9.35	6500	0.9482	0.7161	0.3063
2.0536	10.07	7000	0.8242	0.6767	0.2860
1.9803	10.79	7500	0.7643	0.6563	0.2637
1.9468	11.51	8000	0.7319	0.6441	0.2505
1.9178	12.23	8500	0.6937	0.6320	0.2489
1.8515	12.95	9000	0.6443	0.6053	0.2196
1.8083	13.67	9500	0.6286	0.6122	0.2148
1.819	14.39	10000	0.6015	0.5986	0.2074
1.7684	15.11	10500	0.5682	0.5741	0.1982
1.7195	15.83	11000	0.5385	0.5592	0.2007
1.7044	16.55	11500	0.5362	0.5524	0.2097
1.6879	17.27	12000	0.5119	0.5489	0.2083
1.656	17.98	12500	0.4990	0.5362	0.1968
1.6122	18.7	13000	0.4561	0.5092	0.1900
1.5919	19.42	13500	0.4778	0.5225	0.1975
1.5896	20.14	14000	0.4563	0.5098	0.1859
1.5589	20.86	14500	0.4362	0.4940	0.1725
1.5353	21.58	15000	0.4140	0.4826	0.1580
1.5441	22.3	15500	0.4031	0.4742	0.1550
1.5116	23.02	16000	0.3916	0.4748	0.1545
1.4731	23.74	16500	0.3841	0.4810	0.1542
1.4647	24.46	17000	0.3752	0.4524	0.1475
1.4328	25.18	17500	0.3587	0.4476	0.1461
1.4129	25.9	18000	0.3429	0.4242	0.1366
1.4062	26.62	18500	0.3450	0.4251	0.1355
1.3928	27.34	19000	0.3297	0.4145	0.1322
1.3906	28.06	19500	0.3210	0.4185	0.1336
1.358	28.78	20000	0.3131	0.3970	0.1275
1.3445	29.5	20500	0.3069	0.3920	0.1276
1.3159	30.22	21000	0.3035	0.3961	0.1255
1.3044	30.93	21500	0.2952	0.3854	0.1242
1.3034	31.65	22000	0.2966	0.3772	0.1227
1.2963	32.37	22500	0.2844	0.3706	0.1208
1.2765	33.09	23000	0.2841	0.3567	0.1173
1.2438	33.81	23500	0.2734	0.3552	0.1137
1.2487	34.53	24000	0.2703	0.3502	0.1118
1.2249	35.25	24500	0.2650	0.3484	0.1142
1.2229	35.97	25000	0.2584	0.3374	0.1097
1.2374	36.69	25500	0.2568	0.3337	0.1095
1.2153	37.41	26000	0.2494	0.3327	0.1071
1.1925	38.13	26500	0.2518	0.3366	0.1077
1.1908	38.85	27000	0.2437	0.3272	0.1057
1.1858	39.57	27500	0.2396	0.3265	0.1044
1.1808	40.29	28000	0.2373	0.3156	0.1028
1.1842	41.01	28500	0.2356	0.3152	0.1026
1.1668	41.73	29000	0.2319	0.3188	0.1025
1.1448	42.45	29500	0.2293	0.3099	0.0995
1.1327	43.17	30000	0.2265	0.3047	0.0979
1.1307	43.88	30500	0.2222	0.3078	0.0989
1.1419	44.6	31000	0.2215	0.3038	0.0981
1.1231	45.32	31500	0.2193	0.3013	0.0972
1.139	46.04	32000	0.2162	0.3007	0.0968
1.1114	46.76	32500	0.2122	0.2982	0.0960
1.111	47.48	33000	0.2125	0.2946	0.0948
1.0982	48.2	33500	0.2099	0.2957	0.0953
1.109	48.92	34000	0.2092	0.2955	0.0955
1.0905	49.64	34500	0.2088	0.2954	0.0953

Disclaimer

⚠️ Important Note

Consider the biases from pre - training datasets that may carry over into the model's results.

Authors

Wav2Vec2 XLS - R 300M Korean LM was trained and evaluated by Wilson Wongso. All computation and development are done on OVH Cloud.

Framework versions

Transformers 4.17.0.dev0
Pytorch 1.10.2+cu102
Datasets 1.18.2.dev0
Tokenizers 0.10.3

📄 License

This model is licensed under the apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご