The open-source Thai speech recognition model wav2vec2-large-xlsr-53-th-cv8-deepcut: Precise recognition, highly practical!

Wav2vec2 Large Xlsr 53 Th Cv8 Deepcut

Developed by wannaphong

This model is a Thai automatic speech recognition model trained on the CommonVoice V8 dataset, incorporating the DeepCut tokenizer and language model to improve recognition accuracy.

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Thai speech recognition #Low word error rate #DeepCut tokenizer

Downloads 504

Release Time : 6/7/2022

Model Overview

This model fine-tunes wav2vec2-large-xlsr-53 using the Thai CommonVoice V8 dataset, specifically designed for Thai speech recognition tasks. It supports the DeepCut tokenizer and integrates a language model to enhance performance.

Model Features

Integrated Language Model

Incorporating a language model significantly improves recognition accuracy, reducing WER by approximately 3% on the test set.

Support for Multiple Tokenizers

Supports both DeepCut and Newmm Thai tokenizers, allowing selection of the optimal tokenization method based on requirements.

Multi-Dataset Training

Trained on both CommonVoice V7 and V8 datasets, enhancing the model's generalization capability.

Model Capabilities

Thai speech recognition

Support for multiple tokenization methods

High-accuracy speech-to-text

Use Cases

Speech Transcription

Thai Speech Transcription

Convert Thai speech content into text

Achieves 9.61% WER on the CommonVoice V8 test set

Voice Assistants

Thai Voice Command Recognition

Used for Thai voice assistant command recognition systems

🚀 Thai Wav2Vec2 with CommonVoice V8 (deepcut tokenizer) + language model

This model is designed for automatic speech recognition in Thai. It leverages the CommonVoice V8 dataset and a language model, offering enhanced performance in speech - to - text conversion.

🚀 Quick Start

This model trained with the CommonVoice V8 dataset by increasing data from the CommonVoice V7 dataset that was used in airesearch/wav2vec2-large-xlsr-53-th. It was fine - tuned from wav2vec2-large-xlsr-53.

✨ Features

Language: Thai
Tags: automatic - speech - recognition
License: apache - 2.0
Datasets: common_voice
Metrics: wer, cer

📚 Documentation

Model description

Technical report: Thai Wav2Vec2.0 with CommonVoice V8

Datasets

It increases new data from the Common Voice V8 dataset to the Common Voice V7 dataset or removes all data in the Common Voice V7 dataset before splitting the Common Voice V8 and then adds the CommonVoice V7 dataset back to the dataset.

It uses the ekapolc/Thai_commonvoice_split script for splitting the Common Voice dataset.

Models

This model was fine - tuned from the wav2vec2-large-xlsr-53 model with the Thai Common Voice V8 dataset and uses pre - tokenization with deepcut.tokenize.

Evaluation

Test with CommonVoice V8 Testset

Model	WER by newmm (%)	WER by deepcut (%)	CER
AIResearch.in.th and PyThaiNLP	17.414503	11.923089	3.854153
wav2vec2 with deepcut	16.354521	11.424476	3.684060
wav2vec2 with newmm	16.698299	11.436941	3.737407
wav2vec2 with deepcut + language model	12.630260	9.613886	3.292073
wav2vec2 with newmm + language model	12.583706	9.598305	3.276610

Test with CommonVoice V7 Testset (same test by CV V7)

Model	WER by newmm (%)	WER by deepcut (%)	CER
AIResearch.in.th and PyThaiNLP	13.936698	9.347462	2.804787
wav2vec2 with deepcut	12.776381	8.773006	2.628882
wav2vec2 with newmm	12.750596	8.672616	2.623341
wav2vec2 with deepcut + language model	9.940050	7.423313	2.344940
wav2vec2 with newmm + language model	9.559724	7.339654	2.277071

This uses the same testset from https://huggingface.co/airesearch/wav2vec2-large-xlsr-53-th.

Links:

GitHub Dataset: https://github.com/wannaphong/thai_commonvoice_dataset
Technical report: Thai Wav2Vec2.0 with CommonVoice V8

📄 License

The model is released under the apache - 2.0 license.

🔧 Technical Details

The BibTeX entry and citation info for this model are as follows:

@misc{phatthiyaphaibun2022thai,
      title={Thai Wav2Vec2.0 with CommonVoice V8}, 
      author={Wannaphong Phatthiyaphaibun and Chompakorn Chaksangchaichot and Peerat Limkonchotiwat and Ekapol Chuangsuwanich and Sarana Nutanong},
      year={2022},
      eprint={2208.04799},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご