Wav2vec2 Large Robust Ft Libritts Voxpopuli
W
Wav2vec2 Large Robust Ft Libritts Voxpopuli
Developed by jbetker
A speech recognition model based on wav2vec2-large, specifically designed to generate transcribed text with punctuation, suitable for TTS model construction.
Downloads 339.01k
Release Time : 3/2/2022
Model Overview
This model fine-tunes the facebook/wav2vec2-large-robust-ft-libri-960h checkpoint by adding a punctuation vocabulary, focusing on generating transcribed text with punctuation, especially suitable for TTS applications requiring prosody.
Model Features
Punctuation generation
Designed to generate transcribed text with punctuation, crucial for the prosody performance of TTS models.
High accuracy
Achieves a 4.45% word error rate (WER) on the librispeech validation set, close to the baseline model's 4.3%.
Clean audio optimization
Fine-tuned on clean audio datasets like libritts and voxpopuli, suitable for high-quality audio transcription.
Model Capabilities
Speech-to-text
Punctuation insertion
High-quality audio transcription
Use Cases
Text-to-speech (TTS)
TTS model transcription construction
Generates transcribed text with punctuation for TTS models to enhance prosody performance.
Improves the naturalness and expressiveness of TTS output.
Speech transcription
High-quality audio transcription
Suitable for transcription tasks on clean audio like libritts.
4.45% word error rate (WER).
Featured Recommended AI Models
Š 2025AIbase