A

AV HuBERT

Developed by nguyenvulebinh
A multilingual audio-visual speech recognition model based on the MuAViC dataset, combining audio and visual modalities for robust performance
Downloads 683
Release Time : 8/30/2024

Model Overview

AV-HuBERT is a self-supervised model designed for audio-visual speech recognition, achieving robust performance by integrating audio and visual modalities, especially excelling in noisy environments.

Model Features

Multimodal Fusion
Processes both audio and video inputs simultaneously, leveraging lip movement information to enhance speech recognition
Multilingual Support
Supports multiple languages including Arabic, German, Greek, English, Spanish, French, Italian, Portuguese, Russian, and more
Noise Robustness
Improves recognition accuracy in noisy environments by supplementing audio signals with visual information

Model Capabilities

Audio-Visual Speech Recognition
Multilingual Speech-to-Text
Noise Environment Speech Processing

Use Cases

Speech Recognition
Meeting Transcription
Automatically generates transcripts during video conferences
Improves recognition accuracy in noisy environments
Accessibility Applications
Provides real-time captioning services for the hearing impaired
Enhances comprehension by incorporating lip movement information
Education
Language Learning
Helps learners improve pronunciation by observing lip movements
Provides more accurate pronunciation feedback
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase