W

Wav2vec2 Base Vietnamese 250h

Developed by nguyenvulebinh
Vietnamese automatic speech recognition model based on wav2vec 2.0 architecture, trained on 13,000 hours of unlabeled audio and 250 hours of labeled data
Downloads 6,868
Release Time : 3/2/2022

Model Overview

This model is an end-to-end Vietnamese speech recognition system using Facebook's wav2vec 2.0 architecture, fine-tuned with CTC algorithm, supporting Vietnamese speech-to-text tasks.

Model Features

Large-scale Pretraining
Pretrained on 13,000 hours of Vietnamese YouTube audio
Efficient Fine-tuning
Fine-tuned with 250 hours of labeled speech data to optimize speech recognition performance
Supports Language Model Integration
Can be used with 4-gram language models to significantly reduce word error rate (WER)
End-to-End Solution
Simplifies traditional ASR pipeline by eliminating separate acoustic and language model components

Model Capabilities

Vietnamese speech recognition
Audio-to-text conversion
Supports 16kHz sample rate audio processing

Use Cases

Speech Transcription
Meeting Minutes
Convert Vietnamese meeting recordings into text transcripts
Achieves 6.15% word error rate on VIVOS test set
Voice Assistants
Provides speech recognition capability for Vietnamese voice assistants
Achieves 11.52% word error rate on Common Voice Vietnamese test set
Educational Applications
Language Learning
Helps learners practice Vietnamese pronunciation and listening
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase