A

Asr Wav2vec2 Transformer Aishell

Developed by speechbrain
This is a Transformer-based automatic speech recognition model pre-trained on the AISHELL dataset and wav2vec2, specifically designed for Mandarin speech recognition tasks.
Downloads 99
Release Time : 3/2/2022

Model Overview

This model is an end-to-end automatic speech recognition system that combines a wav2vec2 encoder with a CTC+Transformer joint decoder, suitable for Mandarin speech transcription.

Model Features

End-to-end speech recognition
Provides a complete end-to-end solution from audio input to text output.
wav2vec2 pre-training
Utilizes the powerful feature extraction capabilities of wav2vec2 to improve recognition accuracy.
CTC+Transformer joint decoding
Combines CTC probabilities with a Transformer decoder to enhance recognition performance.
Supports 16kHz audio
Automatically processes mono audio input with a 16kHz sampling rate.

Model Capabilities

Mandarin speech recognition
Audio transcription
Automatic speech recognition

Use Cases

Speech transcription
Mandarin speech to text
Converts Mandarin speech content into text.
Test set CER of 5.58%
Voice assistants
Chinese voice command recognition
Used for voice command recognition in Chinese voice assistants.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase