titu_stt_bn_fastconformer开源模型 - 高精度孟加拉语自动语音转文本

首页

Titu Stt Bn Fastconformer

由 hishab 开发

基于FastConformer架构的孟加拉语自动语音识别模型，训练数据约18K小时，支持高精度语音转文本

语音识别其他#孟加拉语语音识别 #FastConformer架构 #大规模新闻语料

下载量 270

发布时间 : 10/17/2023

模型简介

该模型专为孟加拉语语音识别设计，可用于音频转录或作为预训练模型进行微调

模型特点

大规模训练数据

使用约18K小时的孟加拉语语音数据训练，涵盖多种节目类型

高效架构

基于FastConformer架构，平衡了识别精度和计算效率

领域适应性强

训练数据包含新闻、谈话节目等多种领域，模型泛化能力强

模型能力

孟加拉语语音识别

音频转录

语音转文本

使用案例

语音转录

新闻内容转录

将孟加拉语新闻节目自动转为文字

高准确率的转录结果

谈话节目转录

自动转录孟加拉语谈话类节目内容

支持多种口音和语速

教育

教育材料转录

将孟加拉语教学音频转为文字

便于制作字幕或文字教材

🚀 titu_stt_bn_fastconformer模型

titu_stt_bn_fastconformer 是一个基于 fastconformer 的模型，在约18000小时的 MegaBNSpeech 语料库上进行训练。该模型可用于孟加拉语音频转录，也可作为预训练模型，使用 NeMo 框架在自定义数据集上进行微调。

🚀 快速开始

本模型可用于转录孟加拉语音频，也可作为预训练模型，使用 NeMo 框架在自定义数据集上进行微调。

✨ 主要特性

基于 fastconformer 架构，在大规模孟加拉语语料库上训练。
可用于孟加拉语音频转录。
支持作为预训练模型在自定义数据集上微调。

📦 安装指南

要安装 NeMo，请查看 NeMo 文档。可以使用以下命令进行安装：

pip install -q 'nemo_toolkit[asr]'

💻 使用示例

基础用法

下载 test_bn_fastconformer.wav

# pip install -q 'nemo_toolkit[asr]'

import nemo.collections.asr as nemo_asr
asr_model = nemo_asr.models.ASRModel.from_pretrained("hishab/titu_stt_bn_fastconformer")

auido_file = "test_bn_fastconformer.wav"
transcriptions = asr_model.transcribe([auido_file])
print(transcriptions)
# ['আজ সরকারি ছুটির দিন দেশের সব শিক্ষা প্রতিষ্ঠান সহ সরকারি আধা সরকারি স্বায়ত্তশাসিত প্রতিষ্ঠান ও ভবনে জাতীয় পতাকা অর্ধনমিত ও কালো পতাকা উত্তোলন করা হয়েছে']

推理的 Colab Notebook：Bangla FastConformer Infer.ipynb

📚 详细文档

训练数据集

频道类别	时长（小时）
新闻	17640.00
脱口秀	688.82
视频博客	0.02
犯罪节目	4.08
总计	18332.92

训练详情

为了训练该模型，我们选择的数据集包含17640小时的新闻频道内容、688.82小时的脱口秀、0.02小时的视频博客和4.08小时的犯罪节目。

评估

image/png

引用

@inproceedings{nandi-etal-2023-pseudo,
    title = "Pseudo-Labeling for Domain-Agnostic {B}angla Automatic Speech Recognition",
    author = "Nandi, Rabindra Nath  and
      Menon, Mehadi  and
      Muntasir, Tareq  and
      Sarker, Sagor  and
      Muhtaseem, Quazi Sarwar  and
      Islam, Md. Tariqul  and
      Chowdhury, Shammur  and
      Alam, Firoj",
    editor = "Alam, Firoj  and
      Kar, Sudipta  and
      Chowdhury, Shammur Absar  and
      Sadeque, Farig  and
      Amin, Ruhul",
    booktitle = "Proceedings of the First Workshop on Bangla Language Processing (BLP-2023)",
    month = dec,
    year = "2023",
    address = "Singapore",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.banglalp-1.16",
    doi = "10.18653/v1/2023.banglalp-1.16",
    pages = "152--162",
    abstract = "One of the major challenges for developing automatic speech recognition (ASR) for low-resource languages is the limited access to labeled data with domain-specific variations. In this study, we propose a pseudo-labeling approach to develop a large-scale domain-agnostic ASR dataset. With the proposed methodology, we developed a 20k+ hours labeled Bangla speech dataset covering diverse topics, speaking styles, dialects, noisy environments, and conversational scenarios. We then exploited the developed corpus to design a conformer-based ASR system. We benchmarked the trained ASR with publicly available datasets and compared it with other available models. To investigate the efficacy, we designed and developed a human-annotated domain-agnostic test set composed of news, telephony, and conversational data among others. Our results demonstrate the efficacy of the model trained on psuedo-label data for the designed test-set along with publicly-available Bangla datasets. The experimental resources will be publicly available.https://github.com/hishab-nlp/Pseudo-Labeling-for-Domain-Agnostic-Bangla-ASR",
}