Data2vec Audio Large 100h
Data2Vec is a general self-supervised learning framework applicable to speech, natural language processing, and computer vision tasks. This model is a large-scale model pre-trained and fine-tuned on 100 hours of Librispeech audio data.
Downloads 46
Release Time : 4/2/2022
Model Overview
Data2Vec-Audio-Large-100h is a self-supervised learning-based speech recognition model capable of processing 16kHz sampled audio inputs and outputting corresponding text transcriptions.
Model Features
General self-supervised learning framework
The Data2Vec framework can handle speech, natural language processing, and computer vision tasks with the same learning approach, achieving unified cross-modal learning.
Self-distillation setup
The model predicts latent representations of complete input data using a standard Transformer architecture based on masked views of the input, rather than local property targets.
High performance
This method achieves new state-of-the-art or competitive performance with mainstream approaches in major benchmarks such as speech recognition, image classification, and natural language understanding.
Model Capabilities
Speech recognition
Audio transcription
Use Cases
Speech transcription
Audio file transcription
Transcribe 16kHz sampled speech audio files into text.
Highly accurate text output
Featured Recommended AI Models
Š 2025AIbase