S

Stt En Fastconformer Transducer Xlarge

Developed by nvidia
The NVIDIA FastConformer-Transducer is a high-performance model for English automatic speech recognition (ASR), utilizing an optimized FastConformer architecture and Transducer decoder with approximately 618 million parameters.
Downloads 106
Release Time : 6/12/2023

Model Overview

This model transcribes speech into lowercase English letters and is the 'extra large' version of the FastConformer Transducer model. Trained on multiple English speech datasets, it delivers exceptional recognition accuracy.

Model Features

Optimized FastConformer Architecture
Utilizes an optimized Conformer architecture with 8x depthwise separable convolution downsampling for improved processing efficiency.
Multi-dataset Training
Trained on a composite dataset comprising thousands of hours of English speech, covering diverse speech scenarios.
High Accuracy
Delivers outstanding performance on multiple test sets, such as achieving a WER as low as 1.64% on the LibriSpeech test set.
Transducer Decoder
Trained with RNNT loss in a multi-task setting to enhance recognition performance.

Model Capabilities

English speech recognition
Audio transcription
Speech-to-text

Use Cases

Speech Transcription
Meeting Minutes
Automatically transcribe meeting recordings into text records.
Highly accurate text records with WER as low as 1.64%.
Voice Assistants
Provide speech recognition capabilities for voice assistants.
Supports accurate recognition across various speech scenarios.
Media Processing
Video Subtitle Generation
Automatically generate subtitles for video content.
Supports recognition of various accents and speech styles.
Featured Recommended AI Models
ยฉ 2025AIbase