wav2vec-NCKH-2022 Open-source Vietnamese Speech Recognition Model - Support for Fast Audio-to-Text Conversion

Home

Wav2vec NCKH 2022

Developed by hoangbinhmta99

Vietnamese automatic speech recognition model based on Wav2vec2 architecture, supporting audio-to-text conversion

Speech Recognition

Transformers

Other#Vietnamese speech recognition #Transformer architecture #Low-resource optimization

Downloads 29

Release Time : 3/30/2022

Model Overview

This model is an automatic speech recognition (ASR) model based on the Transformer architecture, specifically optimized for Vietnamese speech recognition tasks. It can convert Vietnamese speech into corresponding text content.

Model Features

Vietnamese speech recognition

Speech recognition capability specifically optimized for Vietnamese

Transformer architecture-based

Utilizes advanced Transformer architecture to provide high-quality speech recognition performance

Pre-trained model conversion

Supports conversion from .pt format pre-trained models to Transformer format

Model Capabilities

Vietnamese speech recognition

Audio-to-text conversion

Automatic speech recognition

Use Cases

Speech transcription

Vietnamese speech-to-text

Convert Vietnamese speech content into editable text format

Voice assistant

Vietnamese voice command recognition

Used to build voice assistant systems supporting Vietnamese

🚀 Wav2vec2 NCKH Vietnamese 2022

This is a speech recognition model based on the Transformer architecture, which can perform automatic speech recognition tasks on Vietnamese datasets.

🚀 Quick Start

Convert from model .pt to transformer

You can convert the model from .pt format to the Transformer format by following these steps:

pip install transformers[sentencepiece]
pip install fairseq -U
git clone https://github.com/huggingface/transformers.git
cp transformers/src/transformers/models/wav2vec2/convert_wav2vec2_original_pytorch_checkpoint_to_pytorch.py .
wget https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec_small.pt -O ./wav2vec_small.pt
mkdir dict
wget https://dl.fbaipublicfiles.com/fairseq/wav2vec/dict.ltr.txt
mkdir outputs
python convert_wav2vec2_original_pytorch_checkpoint_to_pytorch.py 
--pytorch_dump_folder_path ./outputs --checkpoint_path ./finetuned/wav2vec_small.pt
 --dict_path ./dict/dict.ltr.txt --not_finetuned

Install and upload model

To install and upload the model, you can use the following commands:

curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
git lfs install
sudo apt-get install git-lfs
git lfs install
git clone https://huggingface.co/hoangbinhmta99/wav2vec-demo
ls
cd wav2vec-demo/
git status
git add .
git commit -m "First model version"
git config --global user.email [yourname]
git config --global user.name [yourpass]
git commit -m "First model version"
git push

✨ Features

Datasets: This model is trained on the vivos and common_voice datasets.
Metrics: The model uses the wer (Word Error Rate) metric for evaluation.
Pipeline Tag: It belongs to the automatic-speech-recognition pipeline.

📦 Installation

The installation steps are included in the "Quick Start" section. You can follow the commands to install the necessary dependencies and clone the model repository.

📚 Documentation

Model Information

Property	Details
Model Type	Wav2vec2 NCKH Vietnamese 2022
Training Data	vivos, common_voice
Metrics	wer
Pipeline Tag	automatic-speech-recognition
Tags	audio, speech, Transformer

Model Results

Task: Speech Recognition (automatic-speech-recognition)
Dataset: Common Voice vi (common_voice with args vi)
Metrics:
- Test WER: No

📄 License

This model is licensed under the cc-by-nc-4.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご