Whisper-large-v2-mix-jp Open-source Model - Free Deployment for High-precision Japanese Speech Recognition

Home

Whisper Large V2 Mix Jp

Developed by vumichien

An automatic speech recognition (ASR) model fine-tuned on Japanese speech datasets based on OpenAI Whisper-large-v2

Speech Recognition

Transformers

Open Source License:Apache-2.0 #Japanese speech recognition #Low word error rate #Multi-dataset fine-tuning

Downloads 93

Release Time : 12/19/2022

Model Overview

This model is a Japanese-optimized version of Whisper-large-v2, specifically fine-tuned for Japanese speech recognition tasks, demonstrating excellent performance in Word Error Rate (WER) and Character Error Rate (CER) metrics.

Model Features

Japanese Optimization

Specifically fine-tuned on JSUT, JSSS, CSS10, and Common Voice Japanese datasets to optimize Japanese speech recognition performance

Low Error Rate

Achieves a Word Error Rate (WER) of 7.65% and a Character Error Rate (CER) of 4.72% on test sets

Efficient Training

Utilizes mixed-precision training and gradient accumulation techniques to optimize training efficiency

Model Capabilities

Japanese speech-to-text

High-precision speech recognition

Long audio processing

Use Cases

Speech Transcription

Japanese Meeting Minutes

Automatically convert Japanese meeting recordings into text transcripts

Accuracy approximately 92.35% (based on 1-WER)

Japanese Media Subtitle Generation

Automatically generate subtitles for Japanese video content

Voice Assistants

Japanese Voice Command Recognition

Used for voice command understanding in Japanese voice assistant systems

🚀 openai/whisper-large-v2

This model is a fine - tuned version of openai/whisper-large-v2 on the vumichien/preprocessed_jsut_jsss_css10_common_voice_11 dataset. It is designed for automatic speech recognition tasks and achieves excellent performance on Japanese language datasets.

📦 Installation

No installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

No code examples are provided in the original document, so this section is skipped.

📚 Documentation

Model Information

Property	Details
Model Type	Fine - tuned version of openai/whisper-large-v2
Training Data	vumichien/preprocessed_jsut_jsss_css10_common_voice_11
Metrics	Wer, Cer
Base Model	openai/whisper-large-v2

Evaluation Results

This model achieves the following results on the evaluation set:

Loss: 0.2284
Wer: 7.6453
Cer: 4.7187

The detailed evaluation results on different datasets are as follows:

mozilla-foundation/common_voice_11_0 ja:
- Task: Automatic Speech Recognition
- Metrics:
  - Wer: 7.6453
  - Cer: 4.7187
google/fleurs (ja_jp):
- Task: Automatic Speech Recognition
- Metrics:
  - Wer: 11.69
  - Cer: 7.12

Training Procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e - 05
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
training_steps: 10000
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer	Cer
0.1912	0.55	1000	0.1828	11.2314	7.0357
0.1329	1.1	2000	0.1618	9.4172	5.9028
0.0912	1.65	3000	0.1616	8.9257	5.4711
0.0576	2.2	4000	0.1664	8.5861	5.3055
0.0449	2.74	5000	0.1642	8.4510	5.2930
0.02	3.29	6000	0.1799	8.1537	5.0354
0.019	3.84	7000	0.1801	8.125	5.0827
0.0067	4.39	8000	0.2003	7.8412	4.8133
0.006	4.94	9000	0.2071	7.5811	4.7023
0.0022	5.49	10000	0.2284	7.6453	4.7187

Framework versions

Transformers 4.26.0.dev0
Pytorch 1.13.0+cu117
Datasets 2.7.1.dev0
Tokenizers 0.13.2

📄 License

This model is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご