Open-source Bambara speech recognition model stt-bm-quartznet15x5-V0

Stt Bm Quartznet15x5 V0

Developed by RobotsMali

This is a Bambara automatic speech recognition model fine-tuned based on the NVIDIA NeMo framework, suitable for Bambara speech-to-text tasks.

Speech Recognition

PyTorch

Other#Bambara speech recognition #Low-resource language optimization #CTC loss training

Downloads 88

Release Time : 2/7/2025

Model Overview

This model is a fine-tuned version of NVIDIA stt_fr_quartznet15x5, optimized specifically for Bambara automatic speech recognition and trained using the CTC loss function.

Model Features

Bambara Optimization

Specially fine-tuned and optimized for Bambara speech recognition

Lightweight Architecture

Uses the QuartzNet 15x5 architecture with only 19M parameters, suitable for resource-limited environments

Continuous Improvement

Part of an ongoing research project, with further optimizations planned for future versions

Model Capabilities

Bambara speech recognition

16kHz mono audio processing

Use Cases

Speech-to-Text

Bambara Speech Transcription

Convert Bambara speech into text

Achieved a WER of 46.5% on the test set

🚀 QuartzNet 15x5 CTC Bambara

stt-bm-quartznet15x5-V0 is a fine - tuned version of NVIDIA’s stt_fr_quartznet15x5 optimized for Bambara Automatic Speech Recognition (ASR). It uses a character encoding scheme and transcribes text in the standard character set from the training set.

| |

🚀 Quick Start

stt-bm-quartznet15x5-V0 is a fine - tuned version of NVIDIA’s stt_fr_quartznet15x5 optimized for Bambara ASR. This model cannot write Punctuations and Capitalizations. It utilizes a character encoding scheme and transcribes text in the standard character set provided in the training set of the bam - asr - all dataset. The model was fine - tuned using NVIDIA NeMo and is trained with CTC (Connectionist Temporal Classification) Loss.

⚠️ Important Note

⚠️ Important Note

This model, along with its associated resources, is part of an ongoing research effort. Improvements and refinements are expected in future versions. Users should be aware that:

The model may not generalize very well across all speaking conditions and dialects.
Community feedback is welcome, and contributions are encouraged to refine the model further.

✨ Features

Optimized for Bambara Automatic Speech Recognition.
Utilizes CTC Loss for training.
Based on NVIDIA NeMo toolkit.

📦 Installation

To fine - tune or use the model, install NVIDIA NeMo. We recommend installing it after setting up the latest PyTorch version.

pip install nemo_toolkit['asr']

💻 Usage Examples

Basic Usage

import nemo.collections.asr as nemo_asr
asr_model = nemo_asr.models.EncDecCTCModel.from_pretrained(model_name="RobotsMali/stt-bm-quartznet15x5")

Advanced Usage

# Assuming you have a test audio file named sample_audio.wav
asr_model.transcribe(['sample_audio.wav'])

Input

This model accepts 16 kHz mono - channel audio (wav files) as input.

Output

This model provides transcribed speech as a string for a given speech sample.

📚 Documentation

Model Architecture

QuartzNet is a convolutional architecture, which consists of 1D time - channel separable convolutions optimized for speech recognition. More information on QuartzNet can be found here: QuartzNet Model.

Training

The NeMo toolkit was used to fine - tune this model for 25939 steps over the stt_fr_quartznet15x5 model. This model is trained with this [base config](https://github.com/RobotsMali - AI/bambara - asr/blob/main/configs/quartznet - 20m - config - v2.yaml). The full training configurations, scripts, and experimental logs are available here: 🔗 [Bambara - ASR Experiments](https://github.com/RobotsMali - AI/bambara - asr)

Dataset

This model was fine - tuned on the [bam - asr - early](https://huggingface.co/datasets/RobotsMali/bam - asr - early) dataset, which consists of 37 hours of transcribed Bambara speech data. The dataset is primarily derived from Jeli - ASR dataset (~87%).

Performance

The performance of Automatic Speech Recognition models is measured using Word Error Rate (WER%).

Version	Tokenizer	Vocabulary Size	bam - asr - all (test set)
V2	Character - wise	45	46.5

These are greedy WER numbers without external LM.

📄 License

This model is released under the CC - BY - 4.0 license. By using this model, you agree to the terms of the license.

More details are available in the Experimental Technical Report: 📄 [Draft Technical Report - Weights & Biases](https://wandb.ai/yacoudiarra - wl/bam - asr - nemo - training/reports/Draft - Technical - Report - V1--VmlldzoxMTIyOTMzOA).

Feel free to open a discussion on Hugging Face or [file an issue](https://github.com/RobotsMali - AI/bambara - asr/issues) on GitHub if you have any contributions.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご