Hausa - ASR Open - source Hausa Automatic Speech Recognition Model - Precise Recognition for Language Communication

Home

Hausa Asr

Developed by Cdial

Hausa automatic speech recognition model fine-tuned from facebook/wav2vec2-xls-r-300m

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Hausa speech recognition #Low word error rate #Multi-scenario adaptation

Downloads 18

Release Time : 3/2/2022

Model Overview

This is an optimized automatic speech recognition (ASR) model for Hausa language, fine-tuned based on XLS-R architecture and trained on Common Voice Hausa dataset

Model Features

Multi-dataset training

Incorporates all valid data from Common Voice 8.0 Hausa (train/dev/invalidated/reported/other)

High-quality data filtering

Only uses data points with more support votes than opposition votes, with duplicates removed

Optimized training strategy

Employs cosine annealing with restarts learning rate scheduling and mixed precision training

Model Capabilities

Hausa speech recognition

Speech-to-text

Robust speech event detection

Use Cases

Speech technology applications

Hausa voice assistant

Provides voice interaction functionality for Hausa speakers

Speech transcription service

Converts Hausa speech content into text

CER 0.0436, WER 0.2061

🚀 Cdial/Hausa_xlsr

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m. It is designed for automatic speech recognition tasks, achieving high performance on Hausa language datasets.

🚀 Quick Start

Evaluation

To evaluate on mozilla-foundation/common_voice_8_0 with split test

python eval.py --model_id Akashpb13/Hausa_xlsr --dataset mozilla-foundation/common_voice_8_0 --config ha --split test

✨ Features

Fine - Tuned Model: Based on facebook/wav2vec2-xls-r-300m, fine - tuned for better performance on Hausa language tasks.
High Performance: Achieves good results on multiple evaluation metrics such as WER and CER on relevant datasets.

📦 Installation

No installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

No code examples are provided in the original document, so this section is skipped.

📚 Documentation

Model Information

Property	Details
Model Name	Cdial/Hausa_xlsr
Base Model	facebook/wav2vec2-xls-r-300m
Language	Hausa (ha)
Task	Automatic Speech Recognition
License	Apache - 2.0
Tags	automatic - speech - recognition, mozilla - foundation/common_voice_8_0, generated_from_trainer, ha, robust - speech - event, model_for_talk, hf - asr - leaderboard
Datasets	mozilla - foundation/common_voice_8_0

Evaluation Results

The model achieves the following results on different evaluation sets:

Common Voice 8 (mozilla - foundation/common_voice_8_0, ha):
- Test WER: 0.20614541257934219
- Test CER: 0.04358048053214061
Robust Speech Event - Dev Data (speech - recognition - community - v2/dev_data, ha):
- Test WER: 0.20614541257934219
- Test CER: 0.04358048053214061

On the evaluation set (which is 10 percent of train data set merged with invalidated data, reported, other, and dev datasets):

Loss: 0.275118
Wer: 0.329955

Model Description

"facebook/wav2vec2-xls-r-300m" was finetuned.

Intended Uses & Limitations

More information needed

Training and Evaluation Data

Training Data: Common voice Hausa train.tsv, dev.tsv, invalidated.tsv, reported.tsv and other.tsv. Only those points were considered where upvotes were greater than downvotes and duplicates were removed after concatenation of all the datasets given in common voice 7.0.

Training Procedure

Dataset Creation: All possible datasets were appended and a 90 - 10 split was used.
Training Hyperparameters:
- learning_rate: 0.000096
- train_batch_size: 16
- eval_batch_size: 16
- seed: 13
- gradient_accumulation_steps: 2
- lr_scheduler_type: cosine_with_restarts
- lr_scheduler_warmup_steps: 500
- num_epochs: 50
- mixed_precision_training: Native AMP
Training Results: | Step | Training Loss | Validation Loss | Wer | |------|---------------|-----------------|----------| | 500 | 5.175900 | 2.750914 | 1.000000 | | 1000 | 1.028700 | 0.338649 | 0.497999 | | 1500 | 0.332200 | 0.246896 | 0.402241 | | 2000 | 0.227300 | 0.239640 | 0.395839 | | 2500 | 0.175000 | 0.239577 | 0.373966 | | 3000 | 0.140400 | 0.243272 | 0.356095 | | 3500 | 0.119200 | 0.263761 | 0.365164 | | 4000 | 0.099300 | 0.265954 | 0.353428 | | 4500 | 0.084400 | 0.276367 | 0.349693 | | 5000 | 0.073700 | 0.282631 | 0.343825 | | 5500 | 0.068000 | 0.282344 | 0.341158 | | 6000 | 0.064500 | 0.281591 | 0.342491 |

Framework Versions

Transformers 4.16.0.dev0
Pytorch 1.10.0+cu102
Datasets 1.18.3
Tokenizers 0.10.3

🔧 Technical Details

The model is a fine - tuned version of facebook/wav2vec2-xls-r-300m. The fine - tuning process involves adjusting the model's parameters on Hausa language datasets to improve its performance on automatic speech recognition tasks.

📄 License

This model is released under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご