S

Stt En Citrinet 1024 Gamma 0 25

Developed by nvidia
NVIDIA Streaming Citrinet 1024 is a non-autoregressive model for English automatic speech recognition, based on CTC loss/decoding, with approximately 140 million parameters.
Downloads 156
Release Time : 6/24/2022

Model Overview

This model is used to transcribe speech containing lowercase English letters with spaces and apostrophes, trained on thousands of hours of English speech data. It is the 'large' non-autoregressive variant of streaming Citrinet.

Model Features

Streaming Capability
Supports streaming speech recognition, suitable for real-time applications.
High Performance
Excellent performance on multiple standard test sets, such as a WER of only 3.4-7.6 on the LibriSpeech test set.
Large-scale Training Data
Trained on thousands of hours of English speech data, including multiple datasets such as LibriSpeech and Fisher.
Riva Compatibility
Can be integrated with NVIDIA Riva for production-grade server deployment.

Model Capabilities

English Speech Recognition
Real-time Speech Transcription
Batch Audio Processing

Use Cases

Speech-to-Text
Meeting Minutes
Automatically convert meeting recordings into text transcripts.
Highly accurate transcription results.
Subtitle Generation
Automatically generate English subtitles for video content.
Supports batch processing of audio files.
Voice Assistant
Voice Command Recognition
Used in voice command recognition systems for smart devices.
Low-latency real-time recognition.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase