Open-source Thai speech recognition model wav2vec2-large-xlsr-53-th-cv8-newmm - Significantly improve recognition accuracy

Wav2vec2 Large Xlsr 53 Th Cv8 Newmm

Developed by wannaphong

This model is a Thai automatic speech recognition model trained on the CommonVoice V8 dataset, using the wav2vec2-large-xlsr-53 architecture with the newmm tokenizer and integrated language model, significantly improving Thai speech recognition accuracy.

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Thai Speech Recognition #Low Word Error Rate #Multiple Tokenizer Support

Downloads 6,486

Release Time : 6/6/2022

Model Overview

This model is specifically optimized for Thai speech recognition tasks, combining the CommonVoice V8 dataset and a language model to achieve outstanding performance in Word Error Rate (WER) and Character Error Rate (CER).

Model Features

Improved Dataset

Uses the CommonVoice V8 dataset, which has a larger volume and better training results compared to the V7 version.

Optimized Tokenization

Employs the newmm tokenizer for pre-tokenization, optimized for Thai language characteristics.

Language Model Integration

Incorporates a language model to further enhance recognition accuracy.

Multi-Metric Evaluation

Evaluates both Word Error Rate (WER) and Character Error Rate (CER) to comprehensively measure model performance.

Model Capabilities

Thai Speech Recognition

Speech-to-Text

Multi-Metric Performance Evaluation

Use Cases

Speech Transcription

Thai Speech Transcription

Converts Thai speech content into text

Achieved 12.58% WER (newmm tokenizer) on the CommonVoice V8 test set.

Voice Assistants

Thai Voice Command Recognition

Used for Thai voice assistants or smart device command recognition

🚀 Thai Wav2Vec2 with CommonVoice V8 (newmm tokenizer) + language model

This project presents a Thai Wav2Vec2 model trained on the CommonVoice V8 dataset, enhanced with additional data from the CommonVoice V7 dataset used in airesearch/wav2vec2-large-xlsr-53-th. It is a fine - tuned version of wav2vec2-large-xlsr-53.

✨ Features

Trained with an enriched dataset combining CommonVoice V7 and V8.
Fine - tuned from the well - known wav2vec2-large-xlsr-53 model.
Utilizes pythainlp.tokenize.word_tokenize for pre - tokenization.

📚 Documentation

Model description

Technical report: Thai Wav2Vec2.0 with CommonVoice V8

Datasets

The dataset is created by either adding new data from the Common Voice V8 dataset to the Common Voice V7 dataset or removing all data from the Common Voice V7 dataset, splitting the Common Voice V8 dataset, and then adding the CommonVoice V7 dataset back. The ekapolc/Thai_commonvoice_split script is used for splitting the Common Voice dataset.

Models

This model is a fine - tuned version of wav2vec2-large-xlsr-53 using the Thai Common Voice V8 dataset. It uses pythainlp.tokenize.word_tokenize for pre - tokenization.

Training

Many codes are borrowed from vistec - AI/wav2vec2-large-xlsr-53-th, and a bug in the training code is fixed in vistec - AI/wav2vec2-large-xlsr-53-th#2.

Evaluation

Test with CommonVoice V8 Testset

Model	WER by newmm (%)	WER by deepcut (%)	CER
AIResearch.in.th and PyThaiNLP	17.414503	11.923089	3.854153
wav2vec2 with deepcut	16.354521	11.424476	3.684060
wav2vec2 with newmm	16.698299	11.436941	3.737407
wav2vec2 with deepcut + language model	12.630260	9.613886	3.292073
wav2vec2 with newmm + language model	12.583706	9.598305	3.276610

Test with CommonVoice V7 Testset (same test by CV V7)

Model	WER by newmm (%)	WER by deepcut (%)	CER
AIResearch.in.th and PyThaiNLP	13.936698	9.347462	2.804787
wav2vec2 with deepcut	12.776381	8.773006	2.628882
wav2vec2 with newmm	12.750596	8.672616	2.623341
wav2vec2 with deepcut + language model	9.940050	7.423313	2.344940
wav2vec2 with newmm + language model	9.559724	7.339654	2.277071

This evaluation uses the same testset from https://huggingface.co/airesearch/wav2vec2-large-xlsr-53-th.

BibTeX entry and citation info

@misc{phatthiyaphaibun2022thai,
      title={Thai Wav2Vec2.0 with CommonVoice V8}, 
      author={Wannaphong Phatthiyaphaibun and Chompakorn Chaksangchaichot and Peerat Limkonchotiwat and Ekapol Chuangsuwanich and Sarana Nutanong},
      year={2022},
      eprint={2208.04799},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

📄 License

This project is licensed under the Apache - 2.0 license.

Property	Details
Model Type	Thai Wav2Vec2 with CommonVoice V8 (newmm tokenizer) + language model
Training Data	CommonVoice V8 dataset enhanced with CommonVoice V7 data
Metrics	WER, CER
License	Apache - 2.0
Tags	automatic - speech - recognition
Language	Thai

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご