W

Wav2vec2 Large Xlsr 53 Vietnamese

Developed by not-tanh
A Vietnamese automatic speech recognition model fine-tuned based on facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampling rate audio input.
Downloads 22
Release Time : 3/2/2022

Model Overview

This model is an optimized automatic speech recognition (ASR) model for Vietnamese, based on the XLSR-53 architecture, fine-tuned on the Common Voice, VIVOS, and FOSD datasets.

Model Features

Multi-dataset fine-tuning
Fine-tuned using three Vietnamese datasets: Common Voice, VIVOS, and FOSD to improve recognition accuracy.
No language model required
Can be used directly without additional language model support.
16kHz sampling rate support
Optimized for 16kHz sampling rate audio input.

Model Capabilities

Vietnamese speech recognition
Audio to text conversion
Speech transcription

Use Cases

Speech transcription
Vietnamese speech to text
Convert Vietnamese speech content into text
Word Error Rate 39.57%
Voice assistants
Vietnamese voice command recognition
Used for command recognition in Vietnamese voice assistant systems
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase