W

Wav2vec2 Large Xlsr Persian Shemo

Developed by m3hrdadfi
An automatic speech recognition model fine-tuned on the Persian ShEMO dataset based on Wav2Vec2-Large-XLSR-53
Downloads 28
Release Time : 3/2/2022

Model Overview

This model is an automatic speech recognition (ASR) model optimized for Persian (Farsi), based on Facebook's Wav2Vec2-Large-XLSR-53 architecture and fine-tuned on the ShEMO Persian emotional speech dataset, suitable for Persian speech-to-text tasks.

Model Features

Persian optimization
Specifically optimized for Persian speech characteristics, including Persian-specific text normalization processing
Emotional speech recognition
Fine-tuned on the ShEMO emotional speech dataset, providing better recognition for Persian speech with emotional content
No language model required
Can be used directly without additional language model support

Model Capabilities

Persian speech recognition
Emotional speech processing
16kHz audio processing

Use Cases

Speech-to-text
Persian speech transcription
Convert Persian speech content into text
Achieved 31% WER on the ShEMO test set
Emotional speech analysis
Identify Persian speech content with emotional tones
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase