C

Chunkformer Large Vie

Developed by khanhld
A large-scale Vietnamese automatic speech recognition model based on the ChunkFormer architecture, fine-tuned on approximately 3000 hours of publicly available Vietnamese speech data, with excellent performance.
Downloads 1,765
Release Time : 2/1/2025

Model Overview

ChunkFormer-Large-Vie is an automatic speech recognition model specifically optimized for Vietnamese, using the ChunkFormer architecture, achieving leading performance on multiple public datasets.

Model Features

High-performance Vietnamese recognition
Achieved SOTA results on the Common Voice Vi and VIVOS datasets, with WERs of 6.66 and 4.18, respectively.
Long audio processing capability
Supports transcription of long audio, optimizing memory usage and computational efficiency through chunk processing technology.
Multi-dataset training
Trained on approximately 3000 hours of diverse Vietnamese speech data, covering various scenarios and accents.

Model Capabilities

Vietnamese speech recognition
Long audio transcription
Real-time speech-to-text

Use Cases

Speech transcription
Meeting minutes
Automatically transcribe Vietnamese meeting recordings into text records
Highly accurate transcription results
Voice assistant
Provide speech recognition capabilities for Vietnamese voice assistants
Low-latency, high-accuracy recognition
Education
Language learning
Help learners practice Vietnamese pronunciation and listening
Provide accurate pronunciation evaluation
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase