Hausa_xlsr Open-Source Hausa Automatic Speech Recognition Model - Free Support for Speech Content Processing

Hausa Xlsr

Developed by Akashpb13

This is a Hausa automatic speech recognition model fine-tuned from facebook/wav2vec2-xls-r-300m, trained on the Common Voice 8.0 dataset.

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Hausa speech recognition #Low word error rate #Multi-dialect adaptation

Downloads 37

Release Time : 3/2/2022

Model Overview

This model is an optimized automatic speech recognition system for Hausa, fine-tuned based on the XLS-R architecture, suitable for Hausa speech-to-text tasks.

Model Features

Hausa Optimization

Specially fine-tuned and optimized for Hausa speech recognition tasks

High Performance

Achieved a word error rate (WER) of 20.6% and a character error rate (CER) of 4.4% on the Common Voice 8.0 test set

Multi-dataset Training

Incorporated multiple versions of the Common Voice dataset with rigorous data filtering

Model Capabilities

Hausa speech recognition

Speech-to-text

Conversation transcription

Use Cases

Speech Transcription

Hausa Speech Transcription

Convert Hausa speech content into text

Word error rate 20.6%, character error rate 4.4%

Voice Assistants

Hausa Voice Interaction

Provide speech recognition capabilities for Hausa voice assistants

🚀 Akashpb13/Hausa_xlsr

This model is a fine - tuned version of [facebook/wav2vec2 - xls - r - 300m](https://huggingface.co/facebook/wav2vec2 - xls - r - 300m). It is designed for automatic speech recognition in the Hausa language, achieving excellent results on relevant evaluation datasets.

🚀 Quick Start

Evaluation

To evaluate on mozilla - foundation/common_voice_8_0 with split test, you can use the following command:

python eval.py --model_id Akashpb13/Hausa_xlsr --dataset mozilla - foundation/common_voice_8_0 --config ha --split test

✨ Features

Fine - tuned Model: Based on [facebook/wav2vec2 - xls - r - 300m], it has been fine - tuned for better performance in Hausa automatic speech recognition.
Good Performance: Achieves low WER (Word Error Rate) and CER (Character Error Rate) on evaluation sets.

📦 Installation

No installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

No code examples are provided in the original document, so this section is skipped.

📚 Documentation

Model description

"facebook/wav2vec2 - xls - r - 300m" was finetuned.

Intended uses & limitations

More information needed

Training and evaluation data

Training data: Common voice Hausa train.tsv, dev.tsv, invalidated.tsv, reported.tsv and other.tsv. Only those points were considered where upvotes were greater than downvotes and duplicates were removed after concatenation of all the datasets given in common voice 7.0.

Training procedure

For creating the training dataset, all possible datasets were appended and a 90 - 10 split was used.

Training hyperparameters

The following hyperparameters were used during training:

Property	Details
learning_rate	0.000096
train_batch_size	16
eval_batch_size	16
seed	13
gradient_accumulation_steps	2
lr_scheduler_type	cosine_with_restarts
lr_scheduler_warmup_steps	500
num_epochs	50
mixed_precision_training	Native AMP

Training results

Step	Training Loss	Validation Loss	Wer
500	5.175900	2.750914	1.000000
1000	1.028700	0.338649	0.497999
1500	0.332200	0.246896	0.402241
2000	0.227300	0.239640	0.395839
2500	0.175000	0.239577	0.373966
3000	0.140400	0.243272	0.356095
3500	0.119200	0.263761	0.365164
4000	0.099300	0.265954	0.353428
4500	0.084400	0.276367	0.349693
5000	0.073700	0.282631	0.343825
5500	0.068000	0.282344	0.341158
6000	0.064500	0.281591	0.342491

Framework versions

Transformers 4.16.0.dev0
Pytorch 1.10.0+cu102
Datasets 1.18.3
Tokenizers 0.10.3

Model Index

Task	Dataset	Metrics (WER)	Metrics (CER)
Automatic Speech Recognition	Common Voice 8 (mozilla - foundation/common_voice_8_0, args: ha)	0.20614541257934219	0.04358048053214061
Automatic Speech Recognition	Robust Speech Event - Dev Data (speech - recognition - community - v2/dev_data, args: ha)	0.20614541257934219	0.04358048053214061

🔧 Technical Details

No detailed technical implementation details are provided in the original document, so this section is skipped.

📄 License

The model is released under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご