S

Stt En Conformer Transducer Xlarge

Developed by nvidia
This is an Automatic Speech Recognition (ASR) model developed by NVIDIA, based on the Conformer-Transducer architecture, with approximately 600 million parameters, specifically designed for English speech transcription.
Downloads 496
Release Time : 6/13/2022

Model Overview

This model transcribes speech into lowercase English letters, including spaces and apostrophes. It is the 'extra-large' version of the Conformer-Transducer model.

Model Features

High-performance speech recognition
Performs excellently on multiple test sets. For example, the WER on the LibriSpeech clean test set is only 1.62.
Large-scale training data
Trained on a composite dataset (NeMo ASRSET) containing thousands of hours of English speech.
Supports multiple audio formats
Accepts 16KHz mono audio (wav files) as input.

Model Capabilities

English speech recognition
Audio transcription
Automatic speech-to-text conversion

Use Cases

Speech transcription
Meeting minutes
Automatically transcribe meeting recordings into text records.
Highly accurate transcription results
Voice note conversion
Convert voice memos into searchable text.
Voice assistant
Voice command recognition
A voice command recognition system for smart devices.
Featured Recommended AI Models
ยฉ 2025AIbase