S

Stt Zh Citrinet 1024 Gamma 0 25

Developed by nvidia
This is a non-autoregressive Citrinet model for Mandarin Chinese automatic speech recognition (ASR), with approximately 140 million parameters, using character encoding and CTC loss/decoding.
Downloads 92
Release Time : 6/28/2022

Model Overview

This model is specifically designed for Mandarin Chinese speech recognition, trained on the Aishell-2 dataset, and capable of converting 16kHz mono audio into text.

Model Features

Non-Autoregressive Architecture
Utilizes Citrinet's non-autoregressive architecture with CTC loss/decoding instead of Transducer for efficient speech recognition
Character-Level Encoding
Uses standard character sets provided by Aishell-2 for character-level encoding, suitable for Chinese speech recognition
Production-Level Deployment
Compatible with NVIDIA Riva for production-level server deployment
Multi-Scenario Adaptation
Performs stably in various recording environments such as iOS, Android, and microphones

Model Capabilities

Chinese Speech Recognition
Real-Time Speech-to-Text
Supports 16kHz Mono Audio Input

Use Cases

Speech Transcription
Meeting Minutes
Automatically converts Chinese meeting recordings into text transcripts
Achieves a CER of 5.1-5.5% on the AIShell-2 test set
Voice Assistant
Provides speech recognition capabilities for Chinese voice assistants
Speech Analysis
Customer Service Call Analysis
Automatically analyzes content of Chinese customer service calls
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase