W

Wav2vec2 Base Vi Vlsp2020

Developed by nguyenvulebinh
A Vietnamese automatic speech recognition model based on the wav2vec2 architecture, pre-trained on 13,000 hours of unlabeled YouTube audio and fine-tuned on 250 hours of labeled data.
Downloads 262
Release Time : 11/4/2022

Model Overview

This model is specifically designed for Vietnamese automatic speech recognition (ASR) and supports decoding with a language model to improve accuracy.

Model Features

Large-scale Pre-training
Self-supervised pre-training using 13,000 hours of Vietnamese YouTube audio
High-precision Fine-tuning
Fine-tuned on 250 hours of labeled data from the VLSP ASR dataset
Language Model Integration
Supports 5-gram language model decoding, significantly reducing WER

Model Capabilities

Vietnamese speech recognition
Speech decoding with language model

Use Cases

Speech Transcription
Vietnamese Speech to Text
Convert Vietnamese speech content into text
Test set WER as low as 5.32% (with language model)
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase