W

Wav2vec2 Xls R 300m Zh HK Lm V2

Developed by w11wo
An automatic speech recognition model based on XLS-R architecture, optimized for Cantonese (zh-HK), fine-tuned on the Common Voice dataset and enhanced with a 5-gram language model.
Downloads 25
Release Time : 3/2/2022

Model Overview

This model is an automatic speech recognition (ASR) model optimized for Cantonese (zh-HK), fine-tuned from Facebook's Wav2Vec2-XLS-R-300M architecture and integrated with a 5-gram language model trained on the PyCantonese corpus to improve recognition accuracy.

Model Features

Cantonese Optimization
A speech recognition model specifically optimized for Cantonese (zh-HK), fine-tuned on the Common Voice Cantonese dataset.
5-gram Language Model Enhancement
Integrated with a 5-gram language model trained on the PyCantonese corpus, significantly improving recognition accuracy.
Robust Performance
Participated in HuggingFace's Robust Speech Challenge, demonstrating stable performance across different datasets.

Model Capabilities

Cantonese Speech Recognition
Automatic Speech-to-Text
Support for Multiple Speech Datasets

Use Cases

Speech Transcription
Cantonese Speech-to-Text
Convert Cantonese speech content into text transcripts
Achieved a CER of 24.09% on the Common Voice dataset
Speech Application Development
Cantonese Voice Assistant
Develop voice interaction applications supporting Cantonese
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase