wav2vec2-xls-r-1b-japanese Open-source Model - Supports Automatic Speech Recognition for Japanese

Wav2vec2 Xls R 1b Japanese

Developed by vumichien

This model is a fine-tuned version of facebook/wav2vec2-xls-r-1b on public Japanese speech datasets, supporting automatic speech recognition tasks in Japanese.

Speech Recognition

Transformers

JapaneseOpen Source License:Apache-2.0 #Japanese Speech Recognition #Low CER Performance #Multi-dataset Fine-tuning

Downloads 50

Release Time : 3/2/2022

Model Overview

This is an optimized automatic speech recognition model for Japanese, based on the wav2vec2-xls-r-1b architecture, fine-tuned on datasets such as Common Voice.

Model Features

High-Performance Japanese Recognition

Achieves 7.98% WER and 3.42% CER on the Common Voice 7.0 test set

Multi-dataset Training

Combines multiple Japanese speech datasets including Common Voice, JUST, JSSS, and CSS10

Language Model Support

Can be used with a 4-gram language model to significantly improve recognition accuracy

Model Capabilities

Japanese Speech Recognition

Speech-to-Text

Long Audio Processing Support

Use Cases

Speech Transcription

Japanese Speech-to-Text

Convert Japanese speech content into text

Achieves 7.88-7.98% word error rate on the Common Voice test set

Speech Analysis

Japanese Speech Content Analysis

Analyze Japanese speech content to extract key information

🚀 Wav2Vec2-XLS-R-1B Japanese Speech Recognition Model

This model is a fine - tuned version of facebook/wav2vec2-xls-r-1b on a collection of public Japanese voice datasets. It aims to provide high - quality automatic speech recognition for the Japanese language.

📚 Documentation

Model Information

Property	Details
Model Type	Fine - tuned version of wav2vec2-xls-r-1b for Japanese speech recognition
Training Data	Common Voice 7.0, JUST, JSSS, CSS10

Model Results

The model has been evaluated on multiple datasets, and the results are as follows:

Task	Dataset	Metrics	Value
Speech Recognition	Common Voice 7.0	Test WER (with LM)	7.98
Speech Recognition	Common Voice 7.0	Test CER (with LM)	3.42
Speech Recognition	Common Voice 8.0	Test WER (with LM)	7.88
Speech Recognition	Common Voice 8.0	Test CER (with LM)	3.35
Speech Recognition	Robust Speech Event - Dev Data	Test WER (with LM)	28.07
Speech Recognition	Robust Speech Event - Dev Data	Test CER (with LM)	16.27
Automatic Speech Recognition	Robust Speech Event - Test Data	Test CER	19.89

Benchmark Results

WER Results

	COMMON VOICE 7.0	COMMON VOICE 8.0
without LM	10.96	10.91
with 4 - grams LM	7.98	7.88

CER Results

	COMMON VOICE 7.0	COMMON VOICE 8.0
without LM	4.28	4.22
with 4 - grams LM	3.42	3.35

💻 Usage Examples

Evaluation

To evaluate the model, you can use the following command:

pip install mecab-python3 unidic-lite pykakasi
python eval.py --model_id vumichien/wav2vec2-xls-r-1b-japanese --dataset mozilla-foundation/common_voice_7_0 --config ja --split test --chunk_length_s 5.0 --stride_length_s 1.0 --log_outputs

🔧 Technical Details

Training Hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 16
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 1000
num_epochs: 100.0
mixed_precision_training: Native AMP

Training Results

Training Loss	Epoch	Step	Validation Loss	Wer	Cer
2.2896	3.37	1500	0.4748	0.4013	0.1767
1.1608	6.74	3000	0.3350	0.3159	0.1456
1.1042	10.11	4500	0.3119	0.2971	0.1400
1.0494	13.48	6000	0.2974	0.2867	0.1353
1.0061	16.85	7500	0.2802	0.2746	0.1300
0.9629	20.22	9000	0.2844	0.2776	0.1326
0.9267	23.59	10500	0.2577	0.2603	0.1255
0.8984	26.96	12000	0.2508	0.2531	0.1226
0.8729	30.34	13500	0.2629	0.2606	0.1254
0.8546	33.71	15000	0.2402	0.2447	0.1193
0.8304	37.08	16500	0.2532	0.2472	0.1209
0.8075	40.45	18000	0.2439	0.2469	0.1198
0.7827	43.82	19500	0.2387	0.2372	0.1167
0.7627	47.19	21000	0.2344	0.2331	0.1147
0.7402	50.56	22500	0.2314	0.2299	0.1135
0.718	53.93	24000	0.2257	0.2267	0.1114
0.7016	57.3	25500	0.2204	0.2184	0.1089
0.6804	60.67	27000	0.2227	0.2181	0.1085
0.6625	64.04	28500	0.2138	0.2112	0.1058
0.6465	67.42	30000	0.2141	0.2081	0.1044
0.6238	70.79	31500	0.2172	0.2082	0.1050
0.6062	74.16	33000	0.2174	0.2058	0.1043
0.588	77.53	34500	0.2156	0.2034	0.1027
0.5722	80.9	36000	0.2162	0.2032	0.1029
0.5585	84.27	37500	0.2156	0.2022	0.1021
0.5456	87.64	39000	0.2126	0.1993	0.1009
0.5325	91.01	40500	0.2121	0.1966	0.1003
0.5229	94.38	42000	0.2104	0.1941	0.0991
0.5134	97.75	43500	0.2108	0.1948	0.0992

Framework Versions

Transformers 4.16.0.dev0
Pytorch 1.10.1+cu102
Datasets 1.17.1.dev0
Tokenizers 0.11.0

📄 License

This project is licensed under the Apache 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご