P

Parakeet Tdt Ctc 110m

Developed by nvidia
An English speech recognition model jointly developed by NVIDIA NeMo and Suno.ai, supporting punctuation and capitalization conversion, based on FastConformer-TDT-CTC architecture
Downloads 50.47k
Release Time : 9/17/2024

Model Overview

This is an automatic speech recognition (ASR) model capable of transcribing English speech with punctuation and capitalization, based on the hybrid FastConformer TDT-CTC architecture with approximately 114 million parameters

Model Features

Efficient Long Audio Processing
Utilizes the full-attention mechanism FastConformer architecture, capable of processing audio up to 20 minutes in a single pass
Fast Inference Speed
Achieves an average RTFx of approximately 5300 on A100, enabling ultra-fast inference
Punctuation and Capitalization Conversion
Capable of transcribing English speech with punctuation and capitalization
Large-scale Training Data
Trained on 36,000 hours of English speech data, including both private and public datasets

Model Capabilities

English speech recognition
Punctuation conversion
Capitalization conversion
Long audio processing

Use Cases

Speech Transcription
Meeting Minutes Transcription
Convert meeting recordings into punctuated text transcripts
Achieves 15.88% WER on AMI meeting test set
Podcast Transcription
Convert podcast audio content into text
Achieves 2.4-5.2% WER on LibriSpeech test set
Speech Analysis
Financial Earnings Call Analysis
Analyze company earnings call content
Achieves 12.42% WER on Earnings-22 dataset
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase