Whisper-large-v3-myanmar Open-source Speech Recognition Model - Free Deployment to Assist Burmese Speech Transcription

Whisper Large V3 Myanmar

Developed by chuuhtetnaing

This model is an automatic speech recognition model fine-tuned on the Burmese speech dataset based on openai/whisper-large-v3, specifically designed for Burmese speech transcription.

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Burmese speech recognition #Low-resource optimization #Whisper fine-tuning

Downloads 172

Release Time : 8/28/2024

Model Overview

This model is a fine-tuned version of openai/whisper-large-v3 on the Burmese speech dataset, mainly used to transcribe Burmese speech into text.

Model Features

Burmese optimization

Optimized and fine-tuned specifically for Burmese speech recognition

High-precision transcription

Achieved good transcription results on the Burmese speech dataset

Based on the Whisper architecture

Based on the powerful OpenAI Whisper-large-v3 model architecture

Model Capabilities

Burmese speech recognition

Speech-to-text

Use Cases

Speech transcription

Burmese speech-to-text

Transcribe Burmese speech content into text

Word error rate 54.8976

🚀 whisper-large-v3-myanmar

This model is a fine - tuned version of openai/whisper-large-v3 on the chuuhtetnaing/myanmar-speech-dataset-openslr-80 dataset. It is designed for automatic speech recognition, specifically tailored for the Myanmar language.

🚀 Quick Start

This model is a fine-tuned version of openai/whisper-large-v3 on the chuuhtetnaing/myanmar-speech-dataset-openslr-80 dataset. It achieves the following results on the evaluation set:

Loss: 0.1752
Wer: 54.8976

💻 Usage Examples

Basic Usage

from datasets import Audio, load_dataset
from transformers import pipeline

# Load a sample audio
dataset = load_dataset("chuuhtetnaing/myanmar-speech-dataset-openslr-80")
dataset = dataset.cast_column("audio", Audio(sampling_rate=16000))
test_dataset = dataset['test']
input_speech = test_dataset[42]['audio']

pipe = pipeline(model='chuuhtetnaing/whisper-large-v3-myanmar')

output = pipe(input_speech, generate_kwargs={"language": "myanmar", "task": "transcribe"})
print(output['text']) # ကျမ ပြည်ပ မှာ ပညာသင် တော့ စာမေးပွဲ ကို တပတ်တခါ စစ်တယ်

🔧 Technical Details

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 20
eval_batch_size: 20
seed: 42
gradient_accumulation_steps: 3
total_train_batch_size: 60
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 200
num_epochs: 30
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
0.9771	1.0	42	0.7598	100.0
0.3477	2.0	84	0.2140	89.8931
0.2244	3.0	126	0.1816	79.0294
0.1287	4.0	168	0.1510	71.9947
0.1029	5.0	210	0.1575	77.8718
0.0797	6.0	252	0.1315	70.5254
0.0511	7.0	294	0.1143	70.5699
0.03	8.0	336	0.1154	68.1656
0.0211	9.0	378	0.1289	69.1897
0.0151	10.0	420	0.1318	66.7854
0.0113	11.0	462	0.1478	69.1451
0.0079	12.0	504	0.1484	66.2066
0.0053	13.0	546	0.1389	65.0935
0.0031	14.0	588	0.1479	64.3811
0.0014	15.0	630	0.1611	64.8264
0.001	16.0	672	0.1627	63.3571
0.0012	17.0	714	0.1546	65.0045
0.0006	18.0	756	0.1566	64.5147
0.0006	20.0	760	0.1581	64.6928
0.0002	21.0	798	0.1621	63.9804
0.0003	22.0	836	0.1664	60.8638
0.0002	23.0	874	0.1663	58.5040
0.0	24.0	912	0.1699	55.8326
0.0	25.0	950	0.1715	55.0312
0.0	26.0	988	0.1730	54.9866
0.0	27.0	1026	0.1740	54.8976
0.0	28.0	1064	0.1747	54.8976
0.0	29.0	1102	0.1751	54.8976
0.0	30.0	1140	0.1752	54.8976

Framework versions

Transformers 4.35.2
Pytorch 2.1.1+cu121
Datasets 2.14.5
Tokenizers 0.15.1

📄 License

This model is licensed under the Apache 2.0 license.

Property	Details
Model Type	Fine - tuned version of openai/whisper-large-v3
Training Data	chuuhtetnaing/myanmar-speech-dataset-openslr-80

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご