W

Wav2vec2 Large Xls R 300m Cantonese

Developed by ivanlau
This is an automatic speech recognition (ASR) model fine-tuned on Cantonese (Hong Kong) datasets based on the facebook/wav2vec2-xls-r-300m model, specifically designed for Cantonese speech recognition tasks.
Downloads 42
Release Time : 3/2/2022

Model Overview

This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - ZH-HK dataset, primarily used for Cantonese (Hong Kong) speech recognition tasks.

Model Features

Cantonese speech recognition
Speech recognition capability specifically optimized for Hong Kong Cantonese
Based on XLS-R architecture
Uses facebook's wav2vec2-xls-r-300m model as the foundation, with powerful speech feature extraction capabilities
Multi-dataset evaluation
Evaluated on multiple datasets including Common Voice 8 and Robust Speech Event

Model Capabilities

Cantonese speech-to-text
Automatic speech recognition
Speech content transcription

Use Cases

Speech transcription
Cantonese speech content transcription
Convert Cantonese speech content into text
Achieved WER of 0.8111 and CER of 0.2196 on the Common Voice 8 test set
Voice assistant
Cantonese voice command recognition
Recognize and understand Cantonese voice commands
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase