A

AV HuBERT MuAViC Multilingual

Developed by nguyenvulebinh
An audio-visual speech recognition model trained on the MuAViC dataset, which combines audio and visual modalities to improve recognition performance in noisy environments.
Downloads 165
Release Time : 3/6/2025

Model Overview

AV-HuBERT is a self-supervised model for audio-visual speech recognition, leveraging both audio and visual modalities to achieve powerful speech recognition capabilities, especially performing excellently in noisy environments.

Model Features

Multimodal fusion
Utilize both audio and visual (lip movement) information for speech recognition.
Multilingual support
Support the recognition of 9 languages including English, French, and Russian.
Noise robustness
Maintain a high recognition accuracy in noisy environments.
Pretrained model
Provide a pretrained model fine-tuned on the MuAViC dataset.

Model Capabilities

Audio-visual speech recognition
Multilingual speech transcription
Speech processing in noisy environments

Use Cases

Speech recognition
Meeting minutes
Accurately record the content of speeches in a noisy meeting environment.
Improve recognition accuracy by combining visual information.
Video subtitle generation
Automatically generate subtitles for video content.
Improve transcription quality by utilizing lip movement information.
Assistive technology
Hearing assistance
Help people with hearing impairments understand speech content.
Supplement audio information with visual information.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase