D

Data2vec Audio Large 10m

Developed by facebook
Data2Vec is a general self-supervised learning framework applicable to speech, vision, and language tasks. This large audio model is pre-trained and fine-tuned on 10 minutes of Librispeech data, suitable for 16kHz sampled speech audio.
Downloads 19
Release Time : 4/2/2022

Model Overview

Data2Vec-Audio-Large-10m is a self-supervised learning-based speech processing model primarily used for speech recognition tasks. It employs a unified framework to handle different data modalities, achieving efficient learning by predicting latent representations of complete input data.

Model Features

Unified self-supervised learning framework
Uses the same learning approach for speech, natural language processing, and computer vision tasks, achieving cross-modal unified learning.
Context-aware latent representation prediction
Unlike predicting local property targets, this model predicts context-aware latent representations containing complete input information.
High-performance
Achieves state-of-the-art or competitive performance on major benchmarks for speech recognition, image classification, and natural language understanding.

Model Capabilities

Speech recognition
Audio feature extraction

Use Cases

Speech processing
Speech-to-text
Convert speech audio into text content
High-accuracy speech recognition results
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase