A

AV HuBERT MuAViC Ru

Developed by nguyenvulebinh
AV-HuBERT is an audio-visual speech recognition model trained on the MuAViC multilingual audio-visual corpus, combining audio and visual modalities for robust performance.
Downloads 91
Release Time : 3/6/2025

Model Overview

AV-HuBERT is a self-supervised model designed for audio-visual speech recognition, achieving robust performance by integrating audio and visual modalities, especially excelling in noisy environments.

Model Features

Multilingual Support
Supports multiple languages including Arabic, German, Greek, English, Spanish, French, Italian, Portuguese, and Russian.
Audio-Visual Integration
Combines audio and visual modalities to enhance speech recognition performance in noisy environments.
Pre-trained Model
Provides pre-trained models fine-tuned on the MuAViC dataset for quick deployment.

Model Capabilities

Audio-Visual Speech Recognition
Multilingual Speech Recognition
Speech Recognition in Noisy Environments

Use Cases

Speech Recognition
Multilingual Speech Transcription
Convert speech in multiple languages to text
Speech Recognition in Noisy Environments
Perform speech recognition in environments with significant background noise
Improves recognition accuracy by incorporating visual information
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase