đ Wav2vec2 NCKH Vietnamese 2022
This is a speech recognition model based on the Transformer architecture, which can perform automatic speech recognition tasks on Vietnamese datasets.
đ Quick Start
Convert from model .pt to transformer
You can convert the model from .pt
format to the Transformer format by following these steps:
pip install transformers[sentencepiece]
pip install fairseq -U
git clone https://github.com/huggingface/transformers.git
cp transformers/src/transformers/models/wav2vec2/convert_wav2vec2_original_pytorch_checkpoint_to_pytorch.py .
wget https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec_small.pt -O ./wav2vec_small.pt
mkdir dict
wget https://dl.fbaipublicfiles.com/fairseq/wav2vec/dict.ltr.txt
mkdir outputs
python convert_wav2vec2_original_pytorch_checkpoint_to_pytorch.py
--pytorch_dump_folder_path ./outputs --checkpoint_path ./finetuned/wav2vec_small.pt
--dict_path ./dict/dict.ltr.txt --not_finetuned
Install and upload model
To install and upload the model, you can use the following commands:
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
git lfs install
sudo apt-get install git-lfs
git lfs install
git clone https://huggingface.co/hoangbinhmta99/wav2vec-demo
ls
cd wav2vec-demo/
git status
git add .
git commit -m "First model version"
git config --global user.email [yourname]
git config --global user.name [yourpass]
git commit -m "First model version"
git push
⨠Features
- Datasets: This model is trained on the
vivos
and common_voice
datasets.
- Metrics: The model uses the
wer
(Word Error Rate) metric for evaluation.
- Pipeline Tag: It belongs to the
automatic-speech-recognition
pipeline.
đĻ Installation
The installation steps are included in the "Quick Start" section. You can follow the commands to install the necessary dependencies and clone the model repository.
đ Documentation
Model Information
Property |
Details |
Model Type |
Wav2vec2 NCKH Vietnamese 2022 |
Training Data |
vivos, common_voice |
Metrics |
wer |
Pipeline Tag |
automatic-speech-recognition |
Tags |
audio, speech, Transformer |
Model Results
- Task: Speech Recognition (
automatic-speech-recognition
)
- Dataset: Common Voice vi (
common_voice
with args vi
)
- Metrics:
đ License
This model is licensed under the cc-by-nc-4.0
license.