Open-source model of titu_stt_bn_fastconformer - High-precision automatic speech-to-text for Bengali

Titu Stt Bn Fastconformer

Developed by hishab

Bengali automatic speech recognition model based on FastConformer architecture, trained on approximately 18K hours of data, supporting high-accuracy speech-to-text conversion

Speech Recognition Other#Bengali Speech Recognition #FastConformer Architecture #Large-scale News Corpus

Downloads 270

Release Time : 10/17/2023

Model Overview

This model is specifically designed for Bengali speech recognition and can be used for audio transcription or as a pre-trained model for fine-tuning

Model Features

Large-scale Training Data

Trained on approximately 18K hours of Bengali speech data, covering various program types

Efficient Architecture

Based on FastConformer architecture, balancing recognition accuracy and computational efficiency

Strong Domain Adaptability

Training data includes multiple domains such as news and talk shows, ensuring strong model generalization

Model Capabilities

Bengali Speech Recognition

Audio Transcription

Speech-to-Text

Use Cases

Speech Transcription

News Content Transcription

Automatically convert Bengali news programs into text

Highly accurate transcription results

Talk Show Transcription

Automatically transcribe Bengali talk show content

Supports various accents and speech rates

Education

Educational Material Transcription

Convert Bengali educational audio into text

Facilitates subtitle creation or textbook production

🚀 titu_stt_bn_fastconformer

titu_stt_bn_fastconformer is a fastconformer-based model for Bangla automatic speech recognition, trained on a large corpus to offer high - quality transcription services.

🚀 Quick Start

The titu_stt_bn_fastconformer model is designed for transcribing Bangla audio. It can also serve as a pre - trained model for fine - tuning on custom datasets using the NeMo framework.

✨ Features

Based on the fastconformer architecture.
Trained on approximately 18K hours of the MegaBNSpeech corpus.
Can be used for both transcription and fine - tuning on custom datasets.

📦 Installation

To install NeMo, refer to the NeMo documentation. You can use the following command:

pip install -q 'nemo_toolkit[asr]'

💻 Usage Examples

Basic Usage

First, download the test audio file: Download test_bn_fastconformer.wav

# pip install -q 'nemo_toolkit[asr]'

import nemo.collections.asr as nemo_asr
asr_model = nemo_asr.models.ASRModel.from_pretrained("hishab/titu_stt_bn_fastconformer")

auido_file = "test_bn_fastconformer.wav"
transcriptions = asr_model.transcribe([auido_file])
print(transcriptions)
# ['আজ সরকারি ছুটির দিন দেশের সব শিক্ষা প্রতিষ্ঠান সহ সরকারি আধা সরকারি স্বায়ত্তশাসিত প্রতিষ্ঠান ও ভবনে জাতীয় পতাকা অর্ধনমিত ও কালো পতাকা উত্তোলন করা হয়েছে']

You can also use the Colab Notebook for inference: Bangla FastConformer Infer.ipynb

📚 Documentation

Training Datasets

Property	Details
Training Data	The model was trained on a dataset with 17,640.00 hours of news content, 688.82 hours of talk shows, 0.02 hours of vlogs, and 4.08 hours of crime shows, totaling 18,332.92 hours.

Training Details

The dataset selected for training the model consists of 17.64k hours of news channel content, 688.82 hours of talk shows, 0.02 hours of vlogs, and 4.08 hours of crime shows.

Evaluation

image/png

Citation

@inproceedings{nandi-etal-2023-pseudo,
    title = "Pseudo-Labeling for Domain-Agnostic {B}angla Automatic Speech Recognition",
    author = "Nandi, Rabindra Nath  and
      Menon, Mehadi  and
      Muntasir, Tareq  and
      Sarker, Sagor  and
      Muhtaseem, Quazi Sarwar  and
      Islam, Md. Tariqul  and
      Chowdhury, Shammur  and
      Alam, Firoj",
    editor = "Alam, Firoj  and
      Kar, Sudipta  and
      Chowdhury, Shammur Absar  and
      Sadeque, Farig  and
      Amin, Ruhul",
    booktitle = "Proceedings of the First Workshop on Bangla Language Processing (BLP-2023)",
    month = dec,
    year = "2023",
    address = "Singapore",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.banglalp-1.16",
    doi = "10.18653/v1/2023.banglalp-1.16",
    pages = "152--162",
    abstract = "One of the major challenges for developing automatic speech recognition (ASR) for low-resource languages is the limited access to labeled data with domain-specific variations. In this study, we propose a pseudo-labeling approach to develop a large-scale domain-agnostic ASR dataset. With the proposed methodology, we developed a 20k+ hours labeled Bangla speech dataset covering diverse topics, speaking styles, dialects, noisy environments, and conversational scenarios. We then exploited the developed corpus to design a conformer-based ASR system. We benchmarked the trained ASR with publicly available datasets and compared it with other available models. To investigate the efficacy, we designed and developed a human-annotated domain-agnostic test set composed of news, telephony, and conversational data among others. Our results demonstrate the efficacy of the model trained on psuedo-label data for the designed test-set along with publicly-available Bangla datasets. The experimental resources will be publicly available.https://github.com/hishab-nlp/Pseudo-Labeling-for-Domain-Agnostic-Bangla-ASR",
}

📄 License

This model is licensed under the CC - BY - NC - 4.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご