P

Parakeet Rnnt 0.6b

Developed by nvidia
Parakeet RNNT 0.6B is an automatic speech recognition model jointly developed by NVIDIA NeMo and Suno.ai, based on the FastConformer architecture with approximately 600 million parameters, specifically designed for transcribing English speech into text.
Downloads 92.27k
Release Time : 12/28/2023

Model Overview

This model is a high-performance automatic speech recognition system capable of accurately converting English speech into lowercase English text. It has been trained on various public and private datasets, making it suitable for a wide range of speech recognition scenarios.

Model Features

High-Performance FastConformer Architecture
Utilizes an optimized FastConformer architecture with 8x depthwise separable convolution downsampling, providing efficient speech recognition capabilities.
Large-Scale Training Data
Trained on 64K hours of English speech data, including various public and private datasets, ensuring broad applicability of the model.
Multi-Task Training
Employs transformer decoder (RNNT) loss for multi-task training, enhancing the model's recognition accuracy.

Model Capabilities

English Speech Recognition
High-Accuracy Text Transcription
Support for Multiple Audio Formats

Use Cases

Speech-to-Text
Meeting Minutes
Automatically transcribes meeting recordings to generate text records.
Achieves a WER of 17.55 on the AMI meeting test set
Voice Assistants
Provides accurate speech recognition capabilities for voice assistants.
Achieves a WER as low as 1.63-3.06 on the LibriSpeech test set
Media Caption Generation
Automatically generates captions for video and audio content.
Achieves a WER of 3.86 on TEDLIUM-v3
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase