đ titu_stt_bn_fastconformer
titu_stt_bn_fastconformer is a fastconformer-based model for Bangla automatic speech recognition, trained on a large corpus to offer high - quality transcription services.
đ Quick Start
The titu_stt_bn_fastconformer model is designed for transcribing Bangla audio. It can also serve as a pre - trained model for fine - tuning on custom datasets using the NeMo framework.
⨠Features
- Based on the fastconformer architecture.
- Trained on approximately 18K hours of the MegaBNSpeech corpus.
- Can be used for both transcription and fine - tuning on custom datasets.
đĻ Installation
To install NeMo, refer to the NeMo documentation. You can use the following command:
pip install -q 'nemo_toolkit[asr]'
đģ Usage Examples
Basic Usage
First, download the test audio file: Download test_bn_fastconformer.wav
import nemo.collections.asr as nemo_asr
asr_model = nemo_asr.models.ASRModel.from_pretrained("hishab/titu_stt_bn_fastconformer")
auido_file = "test_bn_fastconformer.wav"
transcriptions = asr_model.transcribe([auido_file])
print(transcriptions)
You can also use the Colab Notebook for inference: Bangla FastConformer Infer.ipynb
đ Documentation
Training Datasets
Property |
Details |
Training Data |
The model was trained on a dataset with 17,640.00 hours of news content, 688.82 hours of talk shows, 0.02 hours of vlogs, and 4.08 hours of crime shows, totaling 18,332.92 hours. |
Training Details
The dataset selected for training the model consists of 17.64k hours of news channel content, 688.82 hours of talk shows, 0.02 hours of vlogs, and 4.08 hours of crime shows.
Evaluation

Citation
@inproceedings{nandi-etal-2023-pseudo,
title = "Pseudo-Labeling for Domain-Agnostic {B}angla Automatic Speech Recognition",
author = "Nandi, Rabindra Nath and
Menon, Mehadi and
Muntasir, Tareq and
Sarker, Sagor and
Muhtaseem, Quazi Sarwar and
Islam, Md. Tariqul and
Chowdhury, Shammur and
Alam, Firoj",
editor = "Alam, Firoj and
Kar, Sudipta and
Chowdhury, Shammur Absar and
Sadeque, Farig and
Amin, Ruhul",
booktitle = "Proceedings of the First Workshop on Bangla Language Processing (BLP-2023)",
month = dec,
year = "2023",
address = "Singapore",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.banglalp-1.16",
doi = "10.18653/v1/2023.banglalp-1.16",
pages = "152--162",
abstract = "One of the major challenges for developing automatic speech recognition (ASR) for low-resource languages is the limited access to labeled data with domain-specific variations. In this study, we propose a pseudo-labeling approach to develop a large-scale domain-agnostic ASR dataset. With the proposed methodology, we developed a 20k+ hours labeled Bangla speech dataset covering diverse topics, speaking styles, dialects, noisy environments, and conversational scenarios. We then exploited the developed corpus to design a conformer-based ASR system. We benchmarked the trained ASR with publicly available datasets and compared it with other available models. To investigate the efficacy, we designed and developed a human-annotated domain-agnostic test set composed of news, telephony, and conversational data among others. Our results demonstrate the efficacy of the model trained on psuedo-label data for the designed test-set along with publicly-available Bangla datasets. The experimental resources will be publicly available.https://github.com/hishab-nlp/Pseudo-Labeling-for-Domain-Agnostic-Bangla-ASR",
}
đ License
This model is licensed under the CC - BY - NC - 4.0 license.