🚀 titu_stt_bn_fastconformer模型
titu_stt_bn_fastconformer 是一個基於 fastconformer 的模型,在約18000小時的 MegaBNSpeech 語料庫上進行訓練。該模型可用於孟加拉語音頻轉錄,也可作為預訓練模型,使用 NeMo 框架在自定義數據集上進行微調。
🚀 快速開始
本模型可用於轉錄孟加拉語音頻,也可作為預訓練模型,使用 NeMo 框架在自定義數據集上進行微調。
✨ 主要特性
- 基於 fastconformer 架構,在大規模孟加拉語語料庫上訓練。
- 可用於孟加拉語音頻轉錄。
- 支持作為預訓練模型在自定義數據集上微調。
📦 安裝指南
要安裝 NeMo,請查看 NeMo 文檔。可以使用以下命令進行安裝:
pip install -q 'nemo_toolkit[asr]'
💻 使用示例
基礎用法
下載 test_bn_fastconformer.wav
import nemo.collections.asr as nemo_asr
asr_model = nemo_asr.models.ASRModel.from_pretrained("hishab/titu_stt_bn_fastconformer")
auido_file = "test_bn_fastconformer.wav"
transcriptions = asr_model.transcribe([auido_file])
print(transcriptions)
推理的 Colab Notebook:Bangla FastConformer Infer.ipynb
📚 詳細文檔
訓練數據集
頻道類別 |
時長(小時) |
新聞 |
17640.00 |
脫口秀 |
688.82 |
視頻博客 |
0.02 |
犯罪節目 |
4.08 |
總計 |
18332.92 |
訓練詳情
為了訓練該模型,我們選擇的數據集包含17640小時的新聞頻道內容、688.82小時的脫口秀、0.02小時的視頻博客和4.08小時的犯罪節目。
評估

引用
@inproceedings{nandi-etal-2023-pseudo,
title = "Pseudo-Labeling for Domain-Agnostic {B}angla Automatic Speech Recognition",
author = "Nandi, Rabindra Nath and
Menon, Mehadi and
Muntasir, Tareq and
Sarker, Sagor and
Muhtaseem, Quazi Sarwar and
Islam, Md. Tariqul and
Chowdhury, Shammur and
Alam, Firoj",
editor = "Alam, Firoj and
Kar, Sudipta and
Chowdhury, Shammur Absar and
Sadeque, Farig and
Amin, Ruhul",
booktitle = "Proceedings of the First Workshop on Bangla Language Processing (BLP-2023)",
month = dec,
year = "2023",
address = "Singapore",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.banglalp-1.16",
doi = "10.18653/v1/2023.banglalp-1.16",
pages = "152--162",
abstract = "One of the major challenges for developing automatic speech recognition (ASR) for low-resource languages is the limited access to labeled data with domain-specific variations. In this study, we propose a pseudo-labeling approach to develop a large-scale domain-agnostic ASR dataset. With the proposed methodology, we developed a 20k+ hours labeled Bangla speech dataset covering diverse topics, speaking styles, dialects, noisy environments, and conversational scenarios. We then exploited the developed corpus to design a conformer-based ASR system. We benchmarked the trained ASR with publicly available datasets and compared it with other available models. To investigate the efficacy, we designed and developed a human-annotated domain-agnostic test set composed of news, telephony, and conversational data among others. Our results demonstrate the efficacy of the model trained on psuedo-label data for the designed test-set along with publicly-available Bangla datasets. The experimental resources will be publicly available.https://github.com/hishab-nlp/Pseudo-Labeling-for-Domain-Agnostic-Bangla-ASR",
}
📄 許可證
本項目採用 CC BY-NC 4.0 許可證。