W

Wav2vec2 Large Xlsr 53 Chinese Zh Cn

Developed by jonatasgrosman
A Chinese speech recognition model fine-tuned based on facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampling rate audio input.
Downloads 3.8M
Release Time : 3/2/2022

Model Overview

This model is a fine-tuned XLSR-53 large model for Chinese speech recognition tasks, suitable for converting Chinese speech to text.

Model Features

Multi-dataset Fine-tuning
Fine-tuned using multiple Chinese speech datasets including Common Voice 6.1, CSS10, and ST-CMDS
No Language Model Required
Can be used directly without additional language model support
16kHz Sampling Rate Support
Optimized specifically for 16kHz sampling rate audio input

Model Capabilities

Chinese Speech Recognition
Speech-to-Text

Use Cases

Speech Transcription
Speech Transcription
Convert Chinese speech to text
Achieves CER of 19.03% on Common Voice zh-CN test set
Voice Assistants
Voice Command Recognition
Recognize Chinese voice commands
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase