W

Wav2vec2 Large Xlsr Vietnamese

Developed by Nhut
Vietnamese automatic speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53
Downloads 22
Release Time : 3/2/2022

Model Overview

This is an optimized automatic speech recognition (ASR) model for Vietnamese, based on the XLSR Wav2Vec2 architecture, fine-tuned using Common Voice, FOSD, and VIVOS datasets.

Model Features

Multi-dataset fine-tuning
Trained using three Vietnamese datasets (Common Voice, FOSD, and VIVOS) to enhance model adaptability
16kHz sampling rate support
Optimized for processing speech input at 16kHz sampling rate
No language model required
Can be used directly without additional language model support

Model Capabilities

Vietnamese speech recognition
Automatic speech-to-text conversion

Use Cases

Speech transcription
Vietnamese speech transcription
Convert Vietnamese speech content into text
Achieves 49.59% WER on Common Voice Vietnamese test set
Voice assistants
Vietnamese voice command recognition
For voice command recognition in Vietnamese voice assistants or smart home devices
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase