S

Stt En Fastconformer Hybrid Large Streaming Multi

Developed by nvidia
Cache-aware FastConformer-Hybrid large model supporting multiple look-ahead windows, specifically designed for streaming automatic speech recognition, adaptable to various latency scenarios
Downloads 1,400
Release Time : 10/5/2023

Model Overview

Streaming automatic speech recognition model trained on large-scale English speech data, employing a hybrid FastConformer architecture with flexible latency adjustment support

Model Features

Multi-Latency Streaming
Supports four latency levels: 0ms/80ms/480ms/1040ms, with actual latency approximately half of the nominal value
Hybrid Architecture
Combines the advantages of Transducer and CTC decoders, supporting runtime switching of decoding strategies
Cache-Aware Technology
Utilizes advanced caching mechanisms for streaming processing, maintaining consistency between offline and streaming mode predictions
Large-Scale Training Data
Trained on thousands of hours of diverse English speech data, covering multiple scenarios and accents

Model Capabilities

Real-time speech-to-text
Streaming audio processing
Low-latency speech recognition
Multi-scenario speech transcription

Use Cases

Real-time Transcription
Meeting Live Captioning
Provides low-latency real-time captions for online meetings
5.7% WER at 480ms latency
Customer Service Voice Analysis
Real-time transcription of audio conversations for quality analysis
Supports dynamic latency adjustment to meet various scenario requirements
Media Processing
Video Subtitle Generation
Automatically generates high-precision subtitles for media content
5.4% WER in 1040ms mode
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase