titu_stt_bn_fastconformer開源模型 - 高精度孟加拉語自動語音轉文本

首頁

Titu Stt Bn Fastconformer

由hishab開發

基於FastConformer架構的孟加拉語自動語音識別模型，訓練數據約18K小時，支持高精度語音轉文本

語音識別其他#孟加拉語語音識別 #FastConformer架構 #大規模新聞語料

下載量 270

發布時間 : 10/17/2023

模型概述

該模型專為孟加拉語語音識別設計，可用於音頻轉錄或作為預訓練模型進行微調

模型特點

大規模訓練數據

使用約18K小時的孟加拉語語音數據訓練，涵蓋多種節目類型

高效架構

基於FastConformer架構，平衡了識別精度和計算效率

領域適應性強

訓練數據包含新聞、談話節目等多種領域，模型泛化能力強

模型能力

孟加拉語語音識別

音頻轉錄

語音轉文本

使用案例

語音轉錄

新聞內容轉錄

將孟加拉語新聞節目自動轉為文字

高準確率的轉錄結果

談話節目轉錄

自動轉錄孟加拉語談話類節目內容

支持多種口音和語速

教育

教育材料轉錄

將孟加拉語教學音頻轉為文字

便於製作字幕或文字教材

🚀 titu_stt_bn_fastconformer模型

titu_stt_bn_fastconformer 是一個基於 fastconformer 的模型，在約18000小時的 MegaBNSpeech 語料庫上進行訓練。該模型可用於孟加拉語音頻轉錄，也可作為預訓練模型，使用 NeMo 框架在自定義數據集上進行微調。

🚀 快速開始

本模型可用於轉錄孟加拉語音頻，也可作為預訓練模型，使用 NeMo 框架在自定義數據集上進行微調。

✨ 主要特性

基於 fastconformer 架構，在大規模孟加拉語語料庫上訓練。
可用於孟加拉語音頻轉錄。
支持作為預訓練模型在自定義數據集上微調。

📦 安裝指南

要安裝 NeMo，請查看 NeMo 文檔。可以使用以下命令進行安裝：

pip install -q 'nemo_toolkit[asr]'

💻 使用示例

基礎用法

下載 test_bn_fastconformer.wav

# pip install -q 'nemo_toolkit[asr]'

import nemo.collections.asr as nemo_asr
asr_model = nemo_asr.models.ASRModel.from_pretrained("hishab/titu_stt_bn_fastconformer")

auido_file = "test_bn_fastconformer.wav"
transcriptions = asr_model.transcribe([auido_file])
print(transcriptions)
# ['আজ সরকারি ছুটির দিন দেশের সব শিক্ষা প্রতিষ্ঠান সহ সরকারি আধা সরকারি স্বায়ত্তশাসিত প্রতিষ্ঠান ও ভবনে জাতীয় পতাকা অর্ধনমিত ও কালো পতাকা উত্তোলন করা হয়েছে']

推理的 Colab Notebook：Bangla FastConformer Infer.ipynb

📚 詳細文檔

訓練數據集

頻道類別	時長（小時）
新聞	17640.00
脫口秀	688.82
視頻博客	0.02
犯罪節目	4.08
總計	18332.92

訓練詳情

為了訓練該模型，我們選擇的數據集包含17640小時的新聞頻道內容、688.82小時的脫口秀、0.02小時的視頻博客和4.08小時的犯罪節目。

評估

image/png

引用

@inproceedings{nandi-etal-2023-pseudo,
    title = "Pseudo-Labeling for Domain-Agnostic {B}angla Automatic Speech Recognition",
    author = "Nandi, Rabindra Nath  and
      Menon, Mehadi  and
      Muntasir, Tareq  and
      Sarker, Sagor  and
      Muhtaseem, Quazi Sarwar  and
      Islam, Md. Tariqul  and
      Chowdhury, Shammur  and
      Alam, Firoj",
    editor = "Alam, Firoj  and
      Kar, Sudipta  and
      Chowdhury, Shammur Absar  and
      Sadeque, Farig  and
      Amin, Ruhul",
    booktitle = "Proceedings of the First Workshop on Bangla Language Processing (BLP-2023)",
    month = dec,
    year = "2023",
    address = "Singapore",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.banglalp-1.16",
    doi = "10.18653/v1/2023.banglalp-1.16",
    pages = "152--162",
    abstract = "One of the major challenges for developing automatic speech recognition (ASR) for low-resource languages is the limited access to labeled data with domain-specific variations. In this study, we propose a pseudo-labeling approach to develop a large-scale domain-agnostic ASR dataset. With the proposed methodology, we developed a 20k+ hours labeled Bangla speech dataset covering diverse topics, speaking styles, dialects, noisy environments, and conversational scenarios. We then exploited the developed corpus to design a conformer-based ASR system. We benchmarked the trained ASR with publicly available datasets and compared it with other available models. To investigate the efficacy, we designed and developed a human-annotated domain-agnostic test set composed of news, telephony, and conversational data among others. Our results demonstrate the efficacy of the model trained on psuedo-label data for the designed test-set along with publicly-available Bangla datasets. The experimental resources will be publicly available.https://github.com/hishab-nlp/Pseudo-Labeling-for-Domain-Agnostic-Bangla-ASR",
}