W

Wav2vecbert2 Filledpause

Developed by classla
A model for classifying 20-millisecond audio frames to detect filler pauses (e.g., 'eee', 'errm', etc.)
Downloads 4,290
Release Time : 8/28/2024

Model Overview

This model is trained based on the facebook/w2v-bert-2.0 foundation model, specifically designed to detect filler pauses in speech.

Model Features

Multilingual support
Supports filler pause detection in five languages: Slovenian, Croatian, Serbian, Czech, and Polish
High-precision detection
Achieves an F1 score of 0.968 on the ROG corpus, demonstrating excellent performance
Intelligent post-processing
Significantly improves performance on the ParlaSpeech corpus through post-processing methods like removing short segments at the beginning and end

Model Capabilities

Audio frame classification
Filler pause detection
Multilingual speech analysis

Use Cases

Speech processing
Speech transcription preprocessing
Identify and label filler pauses before transcription to improve accuracy
Reduces non-semantic content in transcription results
Speech quality analysis
Analyze the frequency of filler pauses in speeches or conversations to assess oral fluency
Provides quantitative metrics for speech training or language learning
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase