Nue ASR開源日語語音識別模型 - 集成雙模型，準確快速識別語音

首頁

Nue Asr

由rinna開發

Nue ASR是一種端到端的日語語音識別模型，集成了預訓練的語音和語言模型，識別準確度高且速度快。

語音識別

Transformers

支持多種語言開源協議:Apache-2.0 #日語語音識別 #端到端ASR #預訓練模型集成

下載量 722

發布時間 : 12/7/2023

模型概述

該模型提供端到端的日語語音識別，識別準確度與最新的ASR模型相當。通過使用GPU，可以實現比即時更快的語音識別速度。

模型特點

端到端語音識別

集成了預訓練的語音和語言模型，提供完整的端到端解決方案。

高性能

識別準確度與最新的ASR模型相當，且推理速度快於即時。

預訓練模型集成

使用japanese-hubert-base和japanese-gpt-neox-3.6b的預訓練權重初始化。

大規模訓練數據

在約19,000小時的日語語音語料庫ReazonSpeech v1上進行訓練。

模型能力

日語語音識別

端到端語音轉文本

即時語音處理

使用案例

語音轉寫

會議記錄

將日語會議錄音即時轉寫為文本

高準確度的會議記錄文本

字幕生成

為日語視頻內容自動生成字幕

同步的字幕文件

語音助手

日語語音指令識別

識別和理解日語語音命令

準確的指令識別

🚀 `rinna/nue-asr`

rinna/nue-asr是一個集成了預訓練語音和語言模型的端到端語音識別模型，可提供媲美最新ASR模型的日語語音識別準確率，在GPU上能實現超即時的語音識別。

🚀 快速開始

本模型代碼在Python 3.8.10和3.10.12版本，搭配PyTorch 2.1.1和Transformers 4.35.2進行了測試。此代碼庫預計兼容Python 3.8及更高版本，以及近期的PyTorch版本。Transformers的版本需為4.33.0或更高。

首先，安裝該模型推理所需的代碼：

pip install git+https://github.com/rinnakk/nue-asr.git

本模型提供命令行接口和Python接口。

命令行使用

以下命令使用命令行接口轉錄音頻文件，音頻文件將自動下采樣到16kHz：

nue-asr audio1.wav

你可以指定多個音頻文件：

nue-asr audio1.wav audio2.flac audio3.mp3

我們可以使用DeepSpeed-Inference來加速GPT - NeoX模塊的推理速度。如果你使用DeepSpeed-Inference，則需要安裝DeepSpeed：

pip install deepspeed

然後，你可以按如下方式使用DeepSpeed-Inference：

nue-asr --use-deepspeed audio1.wav

運行nue-asr --help獲取更多信息。

Python使用

Python接口示例如下：

import nue_asr

model = nue_asr.load_model("rinna/nue-asr")
tokenizer = nue_asr.load_tokenizer("rinna/nue-asr")

result = nue_asr.transcribe(model, tokenizer, "path_to_audio.wav")
print(result.text)

nue_asr.transcribe函數除了接受音頻文件路徑外，還可以接受numpy.array或torch.Tensor格式的音頻數據。

在Python接口中也可以使用DeepSpeed-Inference加速推理速度：

import nue_asr

model = nue_asr.load_model("rinna/nue-asr", use_deepspeed=True)
tokenizer = nue_asr.load_tokenizer("rinna/nue-asr")

result = nue_asr.transcribe(model, tokenizer, "path_to_audio.wav")
print(result.text)

✨ 主要特性

提出了一種新穎的端到端語音識別模型Nue ASR，集成了預訓練的語音和語言模型。
模型名稱Nue源自日語詞彙（鵺/ぬえ/Nue），這是日本傳說中的生物之一（妖怪/ようかい/Yōkai）。
提供端到端的日語語音識別，識別準確率可與近期的ASR模型相媲美。
在GPU上使用時，能夠實現超即時的語音識別。

📚 詳細文檔

模型架構

該模型由三個主要組件組成：HuBERT音頻編碼器、橋接網絡和GPT - NeoX解碼器。HuBERT和GPT - NeoX的權重分別使用預訓練的HuBERT和GPT - NeoX權重進行初始化：

訓練

模型在約19,000小時的日語語音語料庫ReazonSpeech v1上進行訓練。請注意，訓練前排除了時長超過16秒的語音樣本：

ReazonSpeech

貢獻者

發佈日期

2023年12月7日

分詞

該模型使用與japanese-gpt-neox-3.6b相同的基於SentencePiece的分詞器。

引用方式

@inproceedings{hono2024integrating,
    title = {Integrating Pre-Trained Speech and Language Models for End-to-End Speech Recognition},
    author = {Hono, Yukiya and Mitsuda, Koh and Zhao, Tianyu and Mitsui, Kentaro and Wakatsuki, Toshiaki and Sawada, Kei},
    booktitle = {Findings of the Association for Computational Linguistics ACL 2024},
    month = {8},
    year = {2024},
    pages = {13289--13305},
    url = {https://aclanthology.org/2024.findings-acl.787}
}

@misc{rinna-nue-asr,
    title = {rinna/nue-asr},
    author = {Hono, Yukiya and Mitsuda, Koh and Zhao, Tianyu and Mitsui, Kentaro and Wakatsuki, Toshiaki and Sawada, Kei},
    url = {https://huggingface.co/rinna/nue-asr}
}

參考文獻

@inproceedings{sawada2024release,
    title = {Release of Pre-Trained Models for the {J}apanese Language},
    author = {Sawada, Kei and Zhao, Tianyu and Shing, Makoto and Mitsui, Kentaro and Kaga, Akio and Hono, Yukiya and Wakatsuki, Toshiaki and Mitsuda, Koh},
    booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
    month = {5},
    year = {2024},
    pages = {13898--13905},
    url = {https://aclanthology.org/2024.lrec-main.1213},
    note = {\url{https://arxiv.org/abs/2404.01657}}
}

@article{hsu2021hubert,
    title = {{HuBERT}: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units},
    author = {Hsu, Wei-Ning and Bolte, Benjamin and Tsai, Yao-Hung Hubert and Lakhotia, Kushal and Salakhutdinov, Ruslan and Mohamed, Abdelrahman},
    journal = {IEEE/ACM Transactions on Audio, Speech, and Language Processing},
    month = {10},
    year = {2021},
    volume = {29},
    pages = {3451--3460},
    doi = {10.1109/TASLP.2021.3122291}
}

@software{andoniangpt2021gpt,
    title = {{GPT}-{N}eo{X}: Large Scale Autoregressive Language Modeling in {P}y{T}orch},
    author = {Andonian, Alex and Anthony, Quentin and Biderman, Stella and Black, Sid and Gali, Preetham and Gao, Leo and Hallahan, Eric and Levy-Kramer, Josh and Leahy, Connor and Nestler, Lucas and Parker, Kip and Pieler, Michael and Purohit, Shivanshu and Songz, Tri and Phil, Wang and Weinbach, Samuel},
    month = {8},
    year = {2021},
    version = {0.0.1},
    doi = {10.5281/zenodo.5879544},
    url = {https://www.github.com/eleutherai/gpt-neox}
}

@inproceedings{aminabadi2022deepspeed,
    title = {{DeepSpeed-Inference}: enabling efficient inference of transformer models at unprecedented scale},
    author = {Aminabadi, Reza Yazdani and Rajbhandari, Samyam and Awan, Ammar Ahmad and Li, Cheng and Li, Du and Zheng, Elton and Ruwase, Olatunji and Smith, Shaden and Zhang, Minjia and Rasley, Jeff and others},
    booktitle = {SC22: International Conference for High Performance Computing, Networking, Storage and Analysis},
    year = {2022},
    pages = {1--15},
    doi = {10.1109/SC41404.2022.00051}
}