W

Wav2vec2 Large Robust Ft Swbd 300h

Developed by facebook
This model is a fine-tuned version of Facebook's Wav2Vec2-Large-Robust, specifically optimized for telephone speech recognition tasks, using 300 hours of Switchboard telephone speech corpus for fine-tuning.
Downloads 2,543
Release Time : 3/2/2022

Model Overview

An automatic speech recognition (ASR) model optimized for telephone speech scenarios, excelling in noisy environments. Supports audio input with a sampling rate of 16kHz.

Model Features

Multi-domain Pre-training
The pre-training phase integrates multi-domain data including audiobooks (Libri-Light), read speech (CommonVoice), and telephone speech (Switchboard/Fisher).
Noise Robustness
Specifically optimized for noisy telephone speech scenarios, fine-tuned on 300 hours of Switchboard telephone corpus.
Cross-domain Adaptation
Research has shown that pre-training with unlabeled data from the target domain significantly improves the model's performance on both in-domain and out-of-domain data.

Model Capabilities

English Speech-to-Text
Noisy Environment Speech Recognition
Telephone Speech Transcription

Use Cases

Speech Transcription Services
Automatic Transcription of Customer Service Calls
Automatically converts call center conversations into text records
Maintains high recognition accuracy in noisy telephone environments
Speech Analysis
Call Content Analysis
Analyzes content of telephone recordings for business or research scenarios
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase