D

Data2vec Audio Base 100h

Developed by facebook
Data2Vec is a general self-supervised learning framework applicable to speech, vision, and language tasks. This audio base model was pre-trained and fine-tuned on 100 hours of Librispeech audio data.
Downloads 4,369
Release Time : 3/2/2022

Model Overview

Data2Vec-Audio is a speech processing model based on self-supervised learning, employing a unified framework to handle data from different modalities. The model is trained by predicting latent representations of complete input data, making it suitable for tasks like speech recognition.

Model Features

General self-supervised learning framework
Uses the same learning approach for speech, natural language processing, and computer vision tasks, achieving unified cross-modal learning.
Contextual latent representation prediction
Unlike predicting local properties, the model predicts contextual latent representations that encompass the entire input information.
High-performance results
Achieves state-of-the-art or comparable performance to mainstream methods in major benchmarks like speech recognition.

Model Capabilities

Speech recognition
Audio feature extraction

Use Cases

Speech processing
Speech-to-text
Convert speech audio into text transcriptions
High-accuracy speech recognition results
Speech data analysis
Extract features from speech for further analysis
Obtain latent representations of speech content
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase