Wav2vec2-large-xls-r-300m-ha-cv8 Open-source Model - Effortlessly Achieve Hausa Speech Recognition

Home

Wav2vec2 Large Xls R 300m Ha Cv8

Developed by anuragshas

A Hausa speech recognition model fine-tuned on the Common Voice dataset based on facebook/wav2vec2-xls-r-300m

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Hausa Speech Recognition #Low-resource Language ASR #Wav2Vec2 Fine-tuning

Downloads 17

Release Time : 3/2/2022

Model Overview

This is an automatic speech recognition (ASR) model optimized for Hausa, based on the XLS-R-300M architecture, fine-tuned on the Common Voice 8.0 Hausa dataset.

Model Features

Hausa Optimization

Specially fine-tuned and optimized for Hausa speech recognition tasks

Based on XLS-R Architecture

Uses Facebook's XLS-R-300M pre-trained model as the foundation

Low Word Error Rate

Achieves a WER of 36.295% on the test set (with language model)

Model Capabilities

Hausa Speech Recognition

Audio-to-Text Conversion

Speech Transcription

Use Cases

Speech Transcription

Hausa Speech-to-Text

Convert Hausa speech content into text

Test set WER 36.295%

Voice Assistants

Hausa Voice Interaction

Supports Hausa voice command recognition

🚀 XLS-R-300M - Hausa

This is a fine - tuned model based on facebook/wav2vec2-xls-r-300m on the common_voice dataset. It offers high - performance speech recognition capabilities for Hausa language, with specific evaluation results on the dataset.

🚀 Quick Start

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the common_voice dataset. It achieves the following results on the evaluation set:

Loss: 0.6094
Wer: 0.5234

📚 Documentation

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 16
eval_batch_size: 8
seed: 13
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e - 08
lr_scheduler_type: cosine_with_restarts
lr_scheduler_warmup_steps: 1000
num_epochs: 100

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
2.9599	6.56	400	2.8650	1.0
2.7357	13.11	800	2.7377	0.9951
1.3012	19.67	1200	0.6686	0.7111
1.0454	26.23	1600	0.5686	0.6137
0.9069	32.79	2000	0.5576	0.5815
0.82	39.34	2400	0.5502	0.5591
0.7413	45.9	2800	0.5970	0.5586
0.6872	52.46	3200	0.5817	0.5428
0.634	59.02	3600	0.5636	0.5314
0.6022	65.57	4000	0.5780	0.5229
0.5705	72.13	4400	0.6036	0.5323
0.5408	78.69	4800	0.6119	0.5336
0.5225	85.25	5200	0.6105	0.5270
0.5265	91.8	5600	0.6034	0.5231
0.5154	98.36	6000	0.6094	0.5234

Framework versions

Transformers 4.16.1
Pytorch 1.10.0+cu111
Datasets 1.18.2
Tokenizers 0.11.0

Evaluation Commands

To evaluate on mozilla - foundation/common_voice_8_0 with split test

python eval.py --model_id anuragshas/wav2vec2-large-xls-r-300m-ha-cv8 --dataset mozilla-foundation/common_voice_8_0 --config ha --split test

💻 Usage Examples

Basic Usage

import torch
from datasets import load_dataset
from transformers import AutoModelForCTC, AutoProcessor
import torchaudio.functional as F
model_id = "anuragshas/wav2vec2-large-xls-r-300m-ha-cv8"
sample_iter = iter(load_dataset("mozilla-foundation/common_voice_8_0", "ha", split="test", streaming=True, use_auth_token=True))
sample = next(sample_iter)
resampled_audio = F.resample(torch.tensor(sample["audio"]["array"]), 48_000, 16_000).numpy()
model = AutoModelForCTC.from_pretrained(model_id)
processor = AutoProcessor.from_pretrained(model_id)
input_values = processor(resampled_audio, return_tensors="pt").input_values
with torch.no_grad():
    logits = model(input_values).logits
transcription = processor.batch_decode(logits.numpy()).text
# => "kakin hade ya ke da kyautar"

Advanced Usage

Eval results on Common Voice 8 "test" (WER):

Without LM	With LM (run `./eval.py`)
47.821	36.295

📄 License

This project is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご