W

Wav2vec2 Large Vi Vlsp2020

Developed by nguyenvulebinh
Vietnamese automatic speech recognition model based on wav2vec2 architecture, pre-trained with 13,000 hours of unlabeled YouTube audio and fine-tuned on 250 hours of labeled data
Downloads 385
Release Time : 11/4/2022

Model Overview

This model is specifically designed for Vietnamese speech recognition tasks, supporting 16kHz sample rate audio input and outputting transcribed text. It includes both base and large versions, with support for integrating language models to improve recognition accuracy.

Model Features

Large-scale Pre-training
Pre-trained with 13,000 hours of Vietnamese YouTube audio to learn rich speech feature representations
Domain Fine-tuning
Fine-tuned on 250 hours of labeled data from the VLSP ASR dataset to optimize Vietnamese recognition performance
Language Model Integration
Supports integration with 5-gram language models, significantly reducing word error rate (WER)
High Performance
Achieves a word error rate of 5.32% on the VLSP T1 test set (when using language model)

Model Capabilities

Vietnamese speech recognition
Audio transcription
Supports 16kHz sample rate audio processing

Use Cases

Speech Transcription
Vietnamese Meeting Minutes
Automatically transcribe Vietnamese meeting recordings into text records
Accuracy exceeds 93% (when using language model)
Media Subtitle Generation
Automatically generate subtitles for Vietnamese video content
Voice Assistants
Vietnamese Voice Command Recognition
Used as the front-end speech recognition module for Vietnamese voice assistants
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase