whisper-th-large-v3-combined Open-source Thai Speech Recognition Model - Accurately Recognize Thai Speech with Low Error Rates

Whisper Th Large V3 Combined

Developed by biodatlab

This is a Thai automatic speech recognition model fine-tuned based on OpenAI's Whisper Large V3 model, achieving a 6.59% word error rate on the Common Voice 13 Thai test set.

Speech Recognition

Transformers

Open Source License:Apache-2.0 #Thai Speech Recognition #Low Word Error Rate #Multi-Dataset Fine-Tuning

Downloads 1,354

Release Time : 2/20/2024

Model Overview

This model is an automatic speech recognition (ASR) model optimized for Thai, fine-tuned on enhanced versions of the Common Voice 13 and FLEURS datasets, specifically designed for Thai speech transcription tasks.

Model Features

Low Word Error Rate

Only 6.59% word error rate (WER) on the Common Voice 13 Thai test set

Thai Optimization

Specially fine-tuned for Thai speech characteristics

Mixed Dataset Training

Enhanced training using multiple datasets including Common Voice 13 and FLEURS

Model Capabilities

Thai Speech Recognition

Audio Transcription

Long Audio Processing (supports 30-second chunks)

Use Cases

Speech Transcription

Thai Meeting Minutes

Automatically transcribe Thai meeting recordings into text

Highly accurate transcription text

Thai Media Subtitle Generation

Automatically generate subtitles for Thai video content

🚀 Whisper Large V3 (Thai): Combined V1

This model is a fine - tuned version of the [openai/whisper - medium](https://huggingface.co/openai/whisper - large - v3). It is trained on augmented versions of the mozilla - foundation/common_voice_13_0 th, google/fleurs, and curated datasets. It provides an effective solution for Thai automatic speech recognition, achieving high accuracy on relevant test sets.

🚀 Quick Start

Use the model with huggingface's transformers as follows:

Basic Usage

from transformers import pipeline

MODEL_NAME = "biodatlab/whisper - th - large - v3 - combined"  # specify the model name
lang = "th"  # change to Thai langauge

device = 0 if torch.cuda.is_available() else "cpu"

pipe = pipeline(
    task="automatic - speech - recognition",
    model=MODEL_NAME,
    chunk_length_s=30,
    device=device,
)
pipe.model.config.forced_decoder_ids = pipe.tokenizer.get_decoder_prompt_ids(
  language=lang,
  task="transcribe"
)
text = pipe("audio.mp3")["text"] # give audio mp3 and transcribe text

✨ Features

This fine - tuned model of Whisper Large V3 for Thai achieves a WER of 6.59 (with Deepcut Tokenizer) on the common - voice - 13 test set, showing high accuracy in Thai automatic speech recognition.

📚 Documentation

Model description

This model is a fine - tuned version of [openai/whisper - medium](https://huggingface.co/openai/whisper - large - v3) on augmented versions of the mozilla - foundation/common_voice_13_0 th, google/fleurs, and curated datasets. It achieves the following results on the common - voice - 13 test set:

WER: 6.59 (with Deepcut Tokenizer)

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e - 05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: AdamW with betas=(0.9,0.999) and epsilon=1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
training_steps: 10000
mixed_precision_training: Native AMP

Framework versions

Transformers 4.37.2
Pytorch 2.1.0
Datasets 2.16.1
Tokenizers 0.15.1

📄 License

This model is licensed under the Apache 2.0 license.

📦 Information Table

Property	Details
Model Type	Fine - tuned Whisper Large V3 for Thai
Training Data	Augmented versions of mozilla - foundation/common_voice_13_0 th, google/fleurs, and curated datasets
Evaluation Metric	WER
Base Model	openai/whisper - large - v3
Results on Test Set (WER)	6.59 (with Deepcut Tokenizer)

📚 Citation

Cite using Bibtex:

@misc {thonburian_whisper_med,
    author       = { Atirut Boribalburephan, Zaw Htet Aung, Knot Pipatsrisawat, Titipat Achakulvisut },
    title        = { Thonburian Whisper: A fine - tuned Whisper model for Thai automatic speech recognition },
    year         = 2022,
    url          = { https://huggingface.co/biodatlab/whisper - th - medium - combined },
    doi          = { 10.57967/hf/0226 },
    publisher    = { Hugging Face }
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご