XLS-R Hausa 40 Open-Source Hausa Automatic Speech Recognition Model - Achieve Accurate Speech Recognition for Free

Xls R Hausa 40

Developed by Mofe

Hausa automatic speech recognition model based on wav2vec2-xls-r-300m architecture, fine-tuned on Common Voice 8.0 Hausa dataset

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Hausa ASR #Low-resource language #Wav2Vec2 fine-tuning

Downloads 22

Release Time : 3/2/2022

Model Overview

This model is an automatic speech recognition (ASR) model optimized for Hausa language, capable of converting Hausa speech to text

Model Features

Hausa Optimization

Specially fine-tuned for Hausa speech recognition tasks

Based on Large-scale Pre-trained Model

Built on wav2vec2-xls-r-300m architecture with powerful speech feature extraction capabilities

Open Source License

Uses Apache 2.0 license, allowing commercial and research use

Model Capabilities

Hausa speech recognition

Speech-to-text

Use Cases

Speech Transcription

Hausa Speech Transcription

Convert Hausa speech content to text

Achieved 51.31% WER on Common Voice test set

Voice Assistant

Hausa Voice Command Recognition

Used for front-end speech recognition in Hausa voice assistants

🚀 Automatic Speech Recognition Model

This model is a fine - tuned speech recognition model that addresses the challenges of transcribing speech in the Hausa language. It offers high - quality automatic speech recognition capabilities, leveraging a pre - trained model and fine - tuning on a specific dataset to achieve good performance.

🚀 Quick Start

This model is a fine - tuned version of [facebook/wav2vec2 - xls - r - 300m](https://huggingface.co/facebook/wav2vec2 - xls - r - 300m) on the MOZILLA - FOUNDATION/COMMON_VOICE_8_0 - HA dataset. It achieves the following results on the evaluation set:

Loss: 0.4998
Wer: 0.5153

✨ Features

Fine - Tuned: Based on the pre - trained [facebook/wav2vec2 - xls - r - 300m](https://huggingface.co/facebook/wav2vec2 - xls - r - 300m), fine - tuned on the HA dataset of MOZILLA - FOUNDATION/COMMON_VOICE_8_0.
Multilingual Adaptability: Leveraged the power of the pre - trained model, potentially adaptable to other languages with further fine - tuning.
Performance Metrics: Achieved specific loss and WER (Word Error Rate) on the evaluation set, demonstrating its effectiveness.

📚 Documentation

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

🔧 Technical Details

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 9.6e - 05
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 2000
num_epochs: 80.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
3.0021	8.33	500	2.9059	1.0
2.6604	16.66	1000	2.6402	0.9892
1.2216	24.99	1500	0.6051	0.6851
1.0754	33.33	2000	0.5408	0.6464
0.9582	41.66	2500	0.5521	0.5935
0.8653	49.99	3000	0.5156	0.5550
0.7867	58.33	3500	0.5439	0.5606
0.7265	66.66	4000	0.4863	0.5255
0.6699	74.99	4500	0.5050	0.5169

Framework versions

Transformers 4.17.0.dev0
Pytorch 1.10.2+cu113
Datasets 1.18.4.dev0
Tokenizers 0.11.0

📄 License

This model is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご