模型概述
模型特點
模型能力
使用案例
🚀 rinna/nue-asr
rinna/nue-asr
是一個集成了預訓練語音和語言模型的端到端語音識別模型,可提供媲美最新ASR模型的日語語音識別準確率,在GPU上能實現超即時的語音識別。
🚀 快速開始
本模型代碼在Python 3.8.10和3.10.12版本,搭配PyTorch 2.1.1和Transformers 4.35.2進行了測試。此代碼庫預計兼容Python 3.8及更高版本,以及近期的PyTorch版本。Transformers的版本需為4.33.0或更高。
首先,安裝該模型推理所需的代碼:
pip install git+https://github.com/rinnakk/nue-asr.git
本模型提供命令行接口和Python接口。
命令行使用
以下命令使用命令行接口轉錄音頻文件,音頻文件將自動下采樣到16kHz:
nue-asr audio1.wav
你可以指定多個音頻文件:
nue-asr audio1.wav audio2.flac audio3.mp3
我們可以使用DeepSpeed-Inference來加速GPT - NeoX模塊的推理速度。如果你使用DeepSpeed-Inference,則需要安裝DeepSpeed:
pip install deepspeed
然後,你可以按如下方式使用DeepSpeed-Inference:
nue-asr --use-deepspeed audio1.wav
運行nue-asr --help
獲取更多信息。
Python使用
Python接口示例如下:
import nue_asr
model = nue_asr.load_model("rinna/nue-asr")
tokenizer = nue_asr.load_tokenizer("rinna/nue-asr")
result = nue_asr.transcribe(model, tokenizer, "path_to_audio.wav")
print(result.text)
nue_asr.transcribe
函數除了接受音頻文件路徑外,還可以接受numpy.array
或torch.Tensor
格式的音頻數據。
在Python接口中也可以使用DeepSpeed-Inference加速推理速度:
import nue_asr
model = nue_asr.load_model("rinna/nue-asr", use_deepspeed=True)
tokenizer = nue_asr.load_tokenizer("rinna/nue-asr")
result = nue_asr.transcribe(model, tokenizer, "path_to_audio.wav")
print(result.text)
✨ 主要特性
- 提出了一種新穎的端到端語音識別模型
Nue ASR
,集成了預訓練的語音和語言模型。 - 模型名稱
Nue
源自日語詞彙(鵺/ぬえ/Nue
),這是日本傳說中的生物之一(妖怪/ようかい/Yōkai
)。 - 提供端到端的日語語音識別,識別準確率可與近期的ASR模型相媲美。
- 在GPU上使用時,能夠實現超即時的語音識別。
📚 詳細文檔
模型架構
該模型由三個主要組件組成:HuBERT音頻編碼器、橋接網絡和GPT - NeoX解碼器。HuBERT和GPT - NeoX的權重分別使用預訓練的HuBERT和GPT - NeoX權重進行初始化:
訓練
模型在約19,000小時的日語語音語料庫ReazonSpeech v1上進行訓練。請注意,訓練前排除了時長超過16秒的語音樣本:
貢獻者
發佈日期
2023年12月7日
分詞
該模型使用與japanese-gpt-neox-3.6b相同的基於SentencePiece的分詞器。
引用方式
@inproceedings{hono2024integrating,
title = {Integrating Pre-Trained Speech and Language Models for End-to-End Speech Recognition},
author = {Hono, Yukiya and Mitsuda, Koh and Zhao, Tianyu and Mitsui, Kentaro and Wakatsuki, Toshiaki and Sawada, Kei},
booktitle = {Findings of the Association for Computational Linguistics ACL 2024},
month = {8},
year = {2024},
pages = {13289--13305},
url = {https://aclanthology.org/2024.findings-acl.787}
}
@misc{rinna-nue-asr,
title = {rinna/nue-asr},
author = {Hono, Yukiya and Mitsuda, Koh and Zhao, Tianyu and Mitsui, Kentaro and Wakatsuki, Toshiaki and Sawada, Kei},
url = {https://huggingface.co/rinna/nue-asr}
}
參考文獻
@inproceedings{sawada2024release,
title = {Release of Pre-Trained Models for the {J}apanese Language},
author = {Sawada, Kei and Zhao, Tianyu and Shing, Makoto and Mitsui, Kentaro and Kaga, Akio and Hono, Yukiya and Wakatsuki, Toshiaki and Mitsuda, Koh},
booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
month = {5},
year = {2024},
pages = {13898--13905},
url = {https://aclanthology.org/2024.lrec-main.1213},
note = {\url{https://arxiv.org/abs/2404.01657}}
}
@article{hsu2021hubert,
title = {{HuBERT}: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units},
author = {Hsu, Wei-Ning and Bolte, Benjamin and Tsai, Yao-Hung Hubert and Lakhotia, Kushal and Salakhutdinov, Ruslan and Mohamed, Abdelrahman},
journal = {IEEE/ACM Transactions on Audio, Speech, and Language Processing},
month = {10},
year = {2021},
volume = {29},
pages = {3451--3460},
doi = {10.1109/TASLP.2021.3122291}
}
@software{andoniangpt2021gpt,
title = {{GPT}-{N}eo{X}: Large Scale Autoregressive Language Modeling in {P}y{T}orch},
author = {Andonian, Alex and Anthony, Quentin and Biderman, Stella and Black, Sid and Gali, Preetham and Gao, Leo and Hallahan, Eric and Levy-Kramer, Josh and Leahy, Connor and Nestler, Lucas and Parker, Kip and Pieler, Michael and Purohit, Shivanshu and Songz, Tri and Phil, Wang and Weinbach, Samuel},
month = {8},
year = {2021},
version = {0.0.1},
doi = {10.5281/zenodo.5879544},
url = {https://www.github.com/eleutherai/gpt-neox}
}
@inproceedings{aminabadi2022deepspeed,
title = {{DeepSpeed-Inference}: enabling efficient inference of transformer models at unprecedented scale},
author = {Aminabadi, Reza Yazdani and Rajbhandari, Samyam and Awan, Ammar Ahmad and Li, Cheng and Li, Du and Zheng, Elton and Ruwase, Olatunji and Smith, Shaden and Zhang, Minjia and Rasley, Jeff and others},
booktitle = {SC22: International Conference for High Performance Computing, Networking, Storage and Analysis},
year = {2022},
pages = {1--15},
doi = {10.1109/SC41404.2022.00051}
}
📄 許可證



