D

Data2vec Audio Large

Developed by facebook
Data2Vec-Audio-Large is a large model pre-trained on 16kHz sampled speech audio using a self-supervised learning framework, suitable for tasks such as speech recognition.
Downloads 97
Release Time : 4/2/2022

Model Overview

This model is the audio implementation version of Facebook's Data2Vec framework, which learns latent representations of speech data through self-distillation and can be applied to tasks such as speech recognition.

Model Features

Unified self-supervised learning framework
Adopts the Data2Vec framework, which can be simultaneously applied to speech, NLP, and computer vision fields.
Contextual latent representation prediction
Unlike predicting local targets, the model predicts contextual latent representations that encompass the entire input information.
16kHz audio support
Specifically optimized for 16kHz sampled speech audio.

Model Capabilities

Speech feature extraction
Self-supervised learning
Speech recognition foundation model

Use Cases

Speech processing
Speech recognition system
Used as a foundation model for building speech recognition systems
Achieves state-of-the-art or surpasses mainstream solutions in speech recognition benchmarks
Speech feature extraction
Extracts high-level feature representations of speech
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase