🚀 titu_stt_bn_fastconformer模型
titu_stt_bn_fastconformer 是一个基于 fastconformer 的模型,在约18000小时的 MegaBNSpeech 语料库上进行训练。该模型可用于孟加拉语音频转录,也可作为预训练模型,使用 NeMo 框架在自定义数据集上进行微调。
🚀 快速开始
本模型可用于转录孟加拉语音频,也可作为预训练模型,使用 NeMo 框架在自定义数据集上进行微调。
✨ 主要特性
- 基于 fastconformer 架构,在大规模孟加拉语语料库上训练。
- 可用于孟加拉语音频转录。
- 支持作为预训练模型在自定义数据集上微调。
📦 安装指南
要安装 NeMo,请查看 NeMo 文档。可以使用以下命令进行安装:
pip install -q 'nemo_toolkit[asr]'
💻 使用示例
基础用法
下载 test_bn_fastconformer.wav
import nemo.collections.asr as nemo_asr
asr_model = nemo_asr.models.ASRModel.from_pretrained("hishab/titu_stt_bn_fastconformer")
auido_file = "test_bn_fastconformer.wav"
transcriptions = asr_model.transcribe([auido_file])
print(transcriptions)
推理的 Colab Notebook:Bangla FastConformer Infer.ipynb
📚 详细文档
训练数据集
频道类别 |
时长(小时) |
新闻 |
17640.00 |
脱口秀 |
688.82 |
视频博客 |
0.02 |
犯罪节目 |
4.08 |
总计 |
18332.92 |
训练详情
为了训练该模型,我们选择的数据集包含17640小时的新闻频道内容、688.82小时的脱口秀、0.02小时的视频博客和4.08小时的犯罪节目。
评估

引用
@inproceedings{nandi-etal-2023-pseudo,
title = "Pseudo-Labeling for Domain-Agnostic {B}angla Automatic Speech Recognition",
author = "Nandi, Rabindra Nath and
Menon, Mehadi and
Muntasir, Tareq and
Sarker, Sagor and
Muhtaseem, Quazi Sarwar and
Islam, Md. Tariqul and
Chowdhury, Shammur and
Alam, Firoj",
editor = "Alam, Firoj and
Kar, Sudipta and
Chowdhury, Shammur Absar and
Sadeque, Farig and
Amin, Ruhul",
booktitle = "Proceedings of the First Workshop on Bangla Language Processing (BLP-2023)",
month = dec,
year = "2023",
address = "Singapore",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.banglalp-1.16",
doi = "10.18653/v1/2023.banglalp-1.16",
pages = "152--162",
abstract = "One of the major challenges for developing automatic speech recognition (ASR) for low-resource languages is the limited access to labeled data with domain-specific variations. In this study, we propose a pseudo-labeling approach to develop a large-scale domain-agnostic ASR dataset. With the proposed methodology, we developed a 20k+ hours labeled Bangla speech dataset covering diverse topics, speaking styles, dialects, noisy environments, and conversational scenarios. We then exploited the developed corpus to design a conformer-based ASR system. We benchmarked the trained ASR with publicly available datasets and compared it with other available models. To investigate the efficacy, we designed and developed a human-annotated domain-agnostic test set composed of news, telephony, and conversational data among others. Our results demonstrate the efficacy of the model trained on psuedo-label data for the designed test-set along with publicly-available Bangla datasets. The experimental resources will be publicly available.https://github.com/hishab-nlp/Pseudo-Labeling-for-Domain-Agnostic-Bangla-ASR",
}
📄 许可证
本项目采用 CC BY-NC 4.0 许可证。