Wav2vec2-xls-r-300m-Japanese Open-source Model - Achieve Accurate Japanese Speech-to-Text for Free

Wav2vec2 Xls R 300m Japanese

Developed by AndrewMcDowell

This is an automatic speech recognition (ASR) model fine-tuned on the Japanese Common Voice 8.0 dataset based on facebook/wav2vec2-xls-r-300m, supporting Japanese speech-to-text functionality.

Speech Recognition

Transformers

JapaneseOpen Source License:Apache-2.0 #Japanese Speech Recognition #Hiragana Output #Low CER Optimization

Downloads 24

Release Time : 3/2/2022

Model Overview

This model is specifically optimized for Japanese speech recognition tasks, capable of converting Japanese speech into Hiragana and Katakana text. Due to the characteristics of Japanese writing, the model primarily uses Character Error Rate (CER) rather than Word Error Rate (WER) for evaluation.

Model Features

Japanese-specific Optimization

Specially trained and optimized for Japanese speech characteristics, supporting Hiragana and Katakana output

Kanji-to-Kana Conversion

Uses the pykakasi library to convert Kanji to Hiragana, simplifying the recognition task

Large-scale Pretraining Foundation

Fine-tuned based on facebook's wav2vec2-xls-r-300m model, featuring powerful speech feature extraction capabilities

Model Capabilities

Japanese Speech Recognition

Speech-to-Text

Continuous Speech Processing

Use Cases

Speech Transcription

Japanese Speech Transcription

Convert Japanese speech content into text format

Achieves 23.64% CER on the Common Voice 8.0 test set

Voice Assistants

Japanese Voice Command Recognition

Recognize and understand Japanese voice commands

🚀 XLS-R-300-m

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - JA dataset. It focuses on automatic speech recognition, aiming to provide high - quality speech - to - text conversion for Japanese and other related languages.

📦 Installation

No installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

No code examples are provided in the original document, so this section is skipped.

📚 Documentation

Model Information

Property	Details
Model Type	Fine - tuned version of facebook/wav2vec2-xls-r-300m
Training Data	mozilla - foundation/common_voice_8_0
Languages	Japanese (ja), German (de)

Model Performance

The model has been evaluated on multiple datasets, and the following are the key performance metrics:

Common Voice 8 (ja):
- Test WER: 95.82%
- Test CER: 23.64%
Robust Speech Event - Dev Data (de):
- Test WER: 100.0%
- Test CER: 30.99%
Robust Speech Event - Dev Data (ja):
- Test CER: 30.37%
Robust Speech Event - Test Data (ja):
- Test CER: 34.42%

Training and Evaluation

Training Hyperparameters

The following hyperparameters were used during training:

learning_rate: 7.5e - 05
train_batch_size: 48
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 2000
num_epochs: 50.0
mixed_precision_training: Native AMP

Training Results

Training Loss	Epoch	Step	Validation Loss	Wer
4.0974	4.72	1000	4.0178	1.9535
2.1276	9.43	2000	0.9301	1.2128
1.7622	14.15	3000	0.7103	1.5527
1.6397	18.87	4000	0.6729	1.4269
1.5468	23.58	5000	0.6087	1.2497
1.4885	28.3	6000	0.5786	1.3222
1.451	33.02	7000	0.5726	1.3768
1.3912	37.74	8000	0.5518	1.2497
1.3617	42.45	9000	0.5352	1.2694
1.3113	47.17	10000	0.5228	1.2781

Framework Versions

Transformers 4.17.0.dev0
Pytorch 1.10.2+cu102
Datasets 1.18.2.dev0
Tokenizers 0.11.0

Evaluation Commands

To evaluate on mozilla - foundation/common_voice_8_0 with split test

python ./eval.py --model_id AndrewMcDowell/wav2vec2-xls-r-300m-japanese --dataset mozilla-foundation/common_voice_8_0 --config ja --split test --log_outputs

To evaluate on mozilla - foundation/common_voice_8_0 with split test

python ./eval.py --model_id AndrewMcDowell/wav2vec2-xls-r-300m-japanese --dataset speech-recognition-community-v2/dev_data --config de --split validation --chunk_length_s 5.0 --stride_length_s 1.0

📄 License

This model is released under the Apache 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご