titu_stt_bn_fastconformerオープンソースモデル - 高精度ベンガル語自動音声からテキストへの変換

ホーム

Titu Stt Bn Fastconformer

hishabによって開発

FastConformerアーキテクチャに基づくベンガル語自動音声認識モデルで、約18K時間のトレーニングデータを使用し、高精度な音声からテキストへの変換をサポート

音声認識その他#ベンガル語音声認識 #FastConformerアーキテクチャ #大規模ニュースコーパス

ダウンロード数 270

リリース時間 : 10/17/2023

モデル概要

このモデルはベンガル語音声認識専用に設計されており、音声の書き起こしやファインチューニング用の事前学習モデルとして使用可能

モデル特徴

大規模トレーニングデータ

約18K時間のベンガル語音声データを使用してトレーニングされ、さまざまな番組タイプをカバー

効率的なアーキテクチャ

FastConformerアーキテクチャに基づき、認識精度と計算効率のバランスを実現

ドメイン適応性が高い

ニュースやトーク番組など多様なドメインを含むトレーニングデータにより、モデルの汎化能力が高い

モデル能力

ベンガル語音声認識

音声書き起こし

音声からテキストへの変換

使用事例

音声書き起こし

ニュースコンテンツの書き起こし

ベンガル語ニュース番組を自動的にテキストに変換

高精度な書き起こし結果

トーク番組の書き起こし

ベンガル語トーク番組の内容を自動的に書き起こし

さまざまなアクセントや話速に対応

教育

教育教材の書き起こし

ベンガル語教育用音声をテキストに変換

字幕やテキスト教材作成に便利

🚀 titu_stt_bn_fastconformer

titu_stt_bn_fastconformerは、約18,000時間のMegaBNSpeechコーパスで学習された、fastconformerベースのモデルです。このモデルは、ベンガル語の音声を文字起こしするために使用でき、NeMoフレームワークを使用してカスタムデータセットで微調整するための事前学習モデルとしても利用できます。

🚀 クイックスタート

このモデルは、ベンガル語の音声を文字起こしするために使用できます。また、NeMoフレームワークを使用して、カスタムデータセットで微調整するための事前学習モデルとしても利用できます。

✨ 主な機能

ベンガル語の音声を文字起こしすることができます。
NeMoフレームワークを使用して、カスタムデータセットで微調整することができます。

📦 インストール

NeMoをインストールするには、NeMoのドキュメントを参照してください。

pip install -q 'nemo_toolkit[asr]'

💻 使用例

基本的な使用法

# pip install -q 'nemo_toolkit[asr]'

import nemo.collections.asr as nemo_asr
asr_model = nemo_asr.models.ASRModel.from_pretrained("hishab/titu_stt_bn_fastconformer")

auido_file = "test_bn_fastconformer.wav"
transcriptions = asr_model.transcribe([auido_file])
print(transcriptions)
# ['আজ সরকারি ছুটির দিন দেশের সব শিক্ষা প্রতিষ্ঠান সহ সরকারি আধা সরকারি স্বায়ত্তশাসিত প্রতিষ্ঠান ও ভবনে জাতীয় পতাকা অর্ধনমিত ও কালো পতাকা উত্তোলন করা হয়েছে']

高度な使用法

このモデルは、NeMoフレームワークを使用して、カスタムデータセットで微調整することができます。詳細については、NeMoのドキュメントを参照してください。

📚 ドキュメント

🔧 技術詳細

学習データセット

チャンネルカテゴリ	時間
ニュース	17,640.00
トークショー	688.82
ブログ	0.02
犯罪番組	4.08
合計	18,332.92

学習の詳細

このモデルを学習するために選択されたデータセットは、17,640時間のニュースチャンネルのコンテンツ、688.82時間のトークショー、0.02時間のブログ、および4.08時間の犯罪番組で構成されています。

評価

image/png

📄 ライセンス

このモデルは、CC BY-NC 4.0ライセンスの下で提供されています。

引用

@inproceedings{nandi-etal-2023-pseudo,
    title = "Pseudo-Labeling for Domain-Agnostic {B}angla Automatic Speech Recognition",
    author = "Nandi, Rabindra Nath  and
      Menon, Mehadi  and
      Muntasir, Tareq  and
      Sarker, Sagor  and
      Muhtaseem, Quazi Sarwar  and
      Islam, Md. Tariqul  and
      Chowdhury, Shammur  and
      Alam, Firoj",
    editor = "Alam, Firoj  and
      Kar, Sudipta  and
      Chowdhury, Shammur Absar  and
      Sadeque, Farig  and
      Amin, Ruhul",
    booktitle = "Proceedings of the First Workshop on Bangla Language Processing (BLP-2023)",
    month = dec,
    year = "2023",
    address = "Singapore",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.banglalp-1.16",
    doi = "10.18653/v1/2023.banglalp-1.16",
    pages = "152--162",
    abstract = "One of the major challenges for developing automatic speech recognition (ASR) for low-resource languages is the limited access to labeled data with domain-specific variations. In this study, we propose a pseudo-labeling approach to develop a large-scale domain-agnostic ASR dataset. With the proposed methodology, we developed a 20k+ hours labeled Bangla speech dataset covering diverse topics, speaking styles, dialects, noisy environments, and conversational scenarios. We then exploited the developed corpus to design a conformer-based ASR system. We benchmarked the trained ASR with publicly available datasets and compared it with other available models. To investigate the efficacy, we designed and developed a human-annotated domain-agnostic test set composed of news, telephony, and conversational data among others. Our results demonstrate the efficacy of the model trained on psuedo-label data for the designed test-set along with publicly-available Bangla datasets. The experimental resources will be publicly available.https://github.com/hishab-nlp/Pseudo-Labeling-for-Domain-Agnostic-Bangla-ASR",
}