W

Wav2vec2 Xlsr Multilingual 53 Fa

Developed by masoudmzb
A multilingual speech recognition model based on the wav2vec 2.0 architecture, specifically fine-tuned for Persian, significantly reducing word error rate
Downloads 83
Release Time : 3/2/2022

Model Overview

This model is a speech recognition model fine-tuned on Persian datasets based on facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampled audio input, suitable for Persian automatic speech recognition tasks

Model Features

Multilingual pre-training foundation
Fine-tuned based on the XLSR-53 multilingual model, benefiting from cross-language learning capabilities
High-performance Persian recognition
Word error rate (WER) of only 0.408 on private test sets, significantly outperforming the base model
Data augmentation effect
Trained with Common Voice and self-built datasets, increased data volume leads to performance improvements

Model Capabilities

Persian speech recognition
16kHz audio processing
End-to-end speech-to-text

Use Cases

Speech transcription
Persian speech transcription
Convert Persian speech content into text
Word error rate 0.408
Voice assistants
Persian voice interaction
Provide recognition capabilities for Persian voice assistants
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase