whisper-large-v2-mn-13 Open Source Speech Recognition Model - Free Support for Automatic Mongolian Speech Recognition

Whisper Large V2 Mn 13

Developed by bayartsogt

A Mongolian speech recognition model fine-tuned on Mongolian datasets based on OpenAI's whisper-large-v2 model, supporting automatic speech recognition tasks in Mongolian.

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Mongolian speech recognition #Low character error rate #Multi-dataset training

Downloads 161

Release Time : 12/20/2022

Model Overview

This is an automatic speech recognition (ASR) model specifically optimized for Mongolian, fine-tuned on multiple Mongolian datasets, capable of converting Mongolian speech into text.

Model Features

Mongolian optimization

Specially optimized and fine-tuned for the characteristics of Mongolian speech

Multi-dataset training

Trained on multiple Mongolian datasets including Common Voice and FLEURS

Low error rate

Achieves low word error rate (WER) and character error rate (CER) on test sets

Model Capabilities

Mongolian speech recognition

Speech-to-text

Use Cases

Speech transcription

Mongolian speech transcription

Convert Mongolian speech content into text

Word error rate 20.02%, character error rate 6.60%

Voice assistant

Mongolian voice interaction

Used to develop voice assistant applications supporting Mongolian

🚀 whisper-large-v2-mn-13

This model is a fine - tuned version of openai/whisper-large-v2 on the None dataset. It offers high - performance automatic speech recognition capabilities, achieving excellent results on the evaluation set.

🚀 Quick Start

This model is a fine - tuned version of openai/whisper-large-v2 on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.1689
Wer: 20.0240
Cer: 6.6010

📚 Documentation

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e - 05
train_batch_size: 8
eval_batch_size: 4
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
training_steps: 25000
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Cer	Validation Loss	Wer
0.3921	0.09	1000	15.7845	0.4101	46.9030
0.3115	0.17	2000	14.2911	0.3353	41.8451
0.2659	0.26	3000	11.8131	0.2800	34.6406
0.2477	0.35	4000	10.6659	0.2578	32.0024
0.2274	0.43	5000	10.0460	0.2463	30.3419
0.2059	0.52	6000	9.9264	0.2305	28.5558
0.2092	0.61	7000	9.4277	0.2196	27.8785
0.1956	0.69	8000	9.2745	0.2093	26.8353
0.195	0.78	9000	8.9485	0.2042	26.6168
0.195	0.87	10000	8.5324	0.2001	25.6718
0.1795	0.95	11000	8.1786	0.1936	24.1698
0.1575	1.04	12000	7.8653	0.1915	23.8912
0.1358	1.13	13000	7.6749	0.1918	23.3778
0.1509	1.21	14000	7.7221	0.1852	23.1811
0.1474	1.3	15000	7.3246	0.1764	22.4984
0.1461	1.39	16000	7.3187	0.1793	22.4110
0.134	1.47	17000	7.1123	0.1737	21.9412
0.1289	1.56	18000	7.4593	0.1727	22.0614
0.1287	1.65	19000	7.0230	0.1701	21.4223
0.1196	1.73	20000	6.9447	0.1666	21.2475
0.1275	1.82	21000	6.7956	0.1653	20.8106
0.1329	1.91	22000	6.7729	0.1622	20.3354
0.1294	1.99	23000	6.6448	0.1606	20.2207
0.1043	2.08	24000	6.6010	0.1689	20.0240
0.079	2.17	25000	6.6246	0.1687	20.1005

Framework versions

Transformers 4.26.0.dev0
Pytorch 1.13.1+cu117
Datasets 2.8.1.dev0
Tokenizers 0.13.2

📄 License

This model is licensed under the Apache - 2.0 license.

📋 Additional Information

Property	Details
Tags	whisper - event, hf - asr - leaderboard, generated_from_multiple_datasets
Datasets	mozilla - foundation/common_voice_11_0, google/fleurs, bayartsogt/ulaanbal - v0, bayartsogt/youtube - mongolian - v1
Metrics	wer, cer
Model Index Name	whisper - large - v2 - mn - 13
Evaluation Task	Automatic Speech Recognition
Evaluation Dataset	Common Voice 11.0 (mozilla - foundation/common_voice_11_0, config: mn, split: test)
Wer on Evaluation Set	20.02403320952589
Cer on Evaluation Set	6.601024224251205

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご