W

Wav2vec2 Large Robust Ft Libri 960h

Developed by facebook
This model is a fine-tuned version of Facebook's Wav2Vec2, specializing in speech recognition tasks. It was pre-trained on various speech data and fine-tuned on Librispeech, featuring strong robustness.
Downloads 161.65k
Release Time : 3/2/2022

Model Overview

This is an Automatic Speech Recognition (ASR) model based on the wav2vec2-large-robust architecture. It was pre-trained on diverse speech data and fine-tuned on 960 hours of Librispeech data, suitable for English speech-to-text tasks.

Model Features

Multi-domain Pre-training
The model was pre-trained on various speech data, including read speech (Libri-Light), crowdsourced speech (CommonVoice), and telephone speech (Switchboard/Fisher), enhancing its robustness.
Target Domain Fine-tuning
Fine-tuned on 960 hours of Librispeech read speech data, improving recognition accuracy in read speech scenarios.
Strong Robustness
Specifically designed to handle speech data from different domains, performing well on both in-domain and out-of-domain data, reducing performance gaps by 66%-73%.

Model Capabilities

English speech recognition
Read speech transcription
Telephone speech transcription
Crowdsourced speech transcription

Use Cases

Speech Transcription
Audiobook Transcription
Convert read audiobook audio into text
Performs well on the Librispeech test set
Telephone Speech Transcription
Transcribe telephone call content
Performs well on Switchboard and Fisher datasets
Voice Assistants
Voice Command Recognition
Recognize user voice commands and convert them to text
Suitable for various speech environments
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase