D

Data2vec Audio Large 960h

Developed by facebook
Data2Vec is a general self-supervised learning framework applicable to speech, vision, and language tasks. This large audio model is pre-trained and fine-tuned on 960 hours of LibriSpeech data, specifically optimized for automatic speech recognition tasks.
Downloads 2,531
Release Time : 4/2/2022

Model Overview

A speech recognition model based on the Data2Vec framework, trained using self-supervised learning on the LibriSpeech dataset, capable of converting speech to text.

Model Features

General self-supervised learning framework
Uses the unified data2vec framework to handle different modality tasks by predicting latent representations of the full input rather than local targets
High-performance speech recognition
Achieves WER metrics of 1.89 (clean) and 4.07 (other) on the LibriSpeech test set
Large-scale training data
Trained on 960 hours of LibriSpeech audio data

Model Capabilities

English speech recognition
Audio-to-text conversion
16kHz sampling rate audio processing

Use Cases

Speech transcription
Meeting transcription
Automatically converts meeting recordings into text transcripts
Podcast content indexing
Creates searchable text indexes for podcast audio
Assistive technology
Hearing assistance
Provides real-time speech-to-text services for the hearing impaired
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase