D

Data2vec Audio Base 960h

Developed by facebook
Data2Vec is a general self-supervised learning framework applicable to speech, vision, and language processing. This model is a speech recognition model pre-trained and fine-tuned on 960 hours of LibriSpeech audio data.
Downloads 10.61k
Release Time : 3/2/2022

Model Overview

Data2Vec-Audio is a self-supervised learning-based speech recognition model that uses the same learning approach for speech, natural language processing, or computer vision tasks. Its core idea is to predict latent representations of complete input data in a self-distillation setting.

Model Features

General Self-supervised Learning Framework
Uses the same learning approach for speech, natural language processing, and computer vision tasks, achieving a unified cross-modal learning framework.
Contextual Latent Representation Prediction
Unlike predicting local features, this model predicts contextual latent representations that encompass the entire input information, enhancing the model's generalization capability.
High-performance Speech Recognition
Achieves WER scores of 2.77 (clean) and 7.08 (other) on the LibriSpeech test set, demonstrating outstanding performance.

Model Capabilities

Speech recognition
Audio transcription
English speech processing

Use Cases

Speech transcription
Automatic meeting minutes transcription
Automatically transcribes meeting recordings into text records to improve meeting efficiency.
WER as low as 2.77 on standard test sets
Podcast content indexing
Automatically transcribes podcast content for easy searching and indexing.
WER of 7.08 when processing speech with various accents
Assistive technology
Hearing assistance applications
Provides real-time speech-to-text services for individuals with hearing impairments.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase