S

Stt En Conformer Ctc Large

Developed by nvidia
This is a large automatic speech recognition (ASR) model based on the Conformer architecture, supporting English speech transcription and trained using CTC loss function.
Downloads 3,740
Release Time : 4/9/2022

Model Overview

This model is used to transcribe English speech into text, supporting lowercase letter output including spaces and apostrophes. Based on a non-autoregressive variant of the Conformer architecture, with approximately 120 million parameters.

Model Features

High-performance speech recognition
Achieves a word error rate (WER) of 2.2% (clean) and 4.3% (other) on the LibriSpeech test set.
Multi-dataset training
Trained on thousands of hours of English speech data, including multiple datasets such as LibriSpeech, Fisher, and Switchboard.
Riva compatible
Supports production-level server deployment via NVIDIA Riva.
Non-autoregressive architecture
Adopts the Conformer-CTC architecture, offering faster inference speed compared to autoregressive models.

Model Capabilities

English speech recognition
Real-time speech transcription
Supports 16kHz mono audio input

Use Cases

Speech transcription
Meeting minutes
Automatically transcribe meeting recordings into text records
Highly accurate transcription results, supporting various accents
Subtitle generation
Automatically generate English subtitles for video content
WER as low as 2.2% on clean speech
Voice assistant
Voice command recognition
Used for voice control of smart home devices
Fast and accurate command recognition
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase