X

Xls Asr Vi 40h 1B

Developed by geninhu
Vietnamese automatic speech recognition model fine-tuned on 40 hours of FPT Open Speech Dataset (FOSD) and Common Voice 7.0 dataset based on facebook/wav2vec2-xls-r-1b
Downloads 23
Release Time : 3/2/2022

Model Overview

This model is optimized for Vietnamese automatic speech recognition (ASR) tasks, demonstrating excellent performance on limited datasets and supporting language model integration to improve recognition accuracy.

Model Features

Efficient fine-tuning
Fine-tuned on only 40 hours of Vietnamese data on a large pre-trained model for efficient resource utilization
Language model support
Supports integration of 4-gram language models, significantly reducing word error rate (WER) and character error rate (CER)
Multi-dataset validation
Comprehensively evaluated on multiple Vietnamese datasets including VIVOS, Common Voice 7.0 and 8.0

Model Capabilities

Vietnamese speech recognition
Speech-to-text
Language model integration support

Use Cases

Speech transcription
Vietnamese speech transcription
Convert Vietnamese speech content into text
Achieved 25.846% WER on Common Voice 7.0 test set
Voice assistants
Vietnamese voice command recognition
Used for front-end speech recognition in Vietnamese voice assistants
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase