W

Wav2vec2 Large Robust Ft Libritts Voxpopuli

Developed by jbetker
A speech recognition model based on wav2vec2-large, specifically designed to generate transcribed text with punctuation, suitable for TTS model construction.
Downloads 339.01k
Release Time : 3/2/2022

Model Overview

This model fine-tunes the facebook/wav2vec2-large-robust-ft-libri-960h checkpoint by adding a punctuation vocabulary, focusing on generating transcribed text with punctuation, especially suitable for TTS applications requiring prosody.

Model Features

Punctuation generation
Designed to generate transcribed text with punctuation, crucial for the prosody performance of TTS models.
High accuracy
Achieves a 4.45% word error rate (WER) on the librispeech validation set, close to the baseline model's 4.3%.
Clean audio optimization
Fine-tuned on clean audio datasets like libritts and voxpopuli, suitable for high-quality audio transcription.

Model Capabilities

Speech-to-text
Punctuation insertion
High-quality audio transcription

Use Cases

Text-to-speech (TTS)
TTS model transcription construction
Generates transcribed text with punctuation for TTS models to enhance prosody performance.
Improves the naturalness and expressiveness of TTS output.
Speech transcription
High-quality audio transcription
Suitable for transcription tasks on clean audio like libritts.
4.45% word error rate (WER).
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase