S

Speecht5 Asr

Developed by microsoft
A SpeechT5 automatic speech recognition model fine-tuned on the LibriSpeech dataset, supporting speech-to-text conversion.
Downloads 12.30k
Release Time : 2/2/2023

Model Overview

SpeechT5 is a unified encoder-decoder pre-training framework designed for spoken language processing tasks, supporting various applications such as speech recognition.

Model Features

Unified Modal Framework
Processes speech and text through a shared encoder-decoder network to achieve cross-modal representation learning.
Cross-modal Vector Quantization
Uses random mixing of speech/text states with latent units to align text and speech information in a unified semantic space.
Multi-task Support
Not only supports speech recognition but can also be used for speech synthesis, speech translation, voice conversion, and other spoken language processing tasks.

Model Capabilities

Speech Recognition
Speech-to-Text

Use Cases

Speech Processing
Automatic Speech Recognition
Converts speech content into text, suitable for meeting transcripts, voice assistants, and other scenarios.
Performs excellently on the LibriSpeech dataset.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase