Data2vec Audio Large
Data2Vec-Audio-Large is a large model pre-trained on 16kHz sampled speech audio using a self-supervised learning framework, suitable for tasks such as speech recognition.
Downloads 97
Release Time : 4/2/2022
Model Overview
This model is the audio implementation version of Facebook's Data2Vec framework, which learns latent representations of speech data through self-distillation and can be applied to tasks such as speech recognition.
Model Features
Unified self-supervised learning framework
Adopts the Data2Vec framework, which can be simultaneously applied to speech, NLP, and computer vision fields.
Contextual latent representation prediction
Unlike predicting local targets, the model predicts contextual latent representations that encompass the entire input information.
16kHz audio support
Specifically optimized for 16kHz sampled speech audio.
Model Capabilities
Speech feature extraction
Self-supervised learning
Speech recognition foundation model
Use Cases
Speech processing
Speech recognition system
Used as a foundation model for building speech recognition systems
Achieves state-of-the-art or surpasses mainstream solutions in speech recognition benchmarks
Speech feature extraction
Extracts high-level feature representations of speech
Featured Recommended AI Models