模型简介
模型特点
模型能力
使用案例
🚀 rinna/nue-asr
rinna/nue-asr
是一个集成了预训练语音和语言模型的端到端语音识别模型,可提供媲美最新ASR模型的日语语音识别准确率,在GPU上能实现超实时的语音识别。
🚀 快速开始
本模型代码在Python 3.8.10和3.10.12版本,搭配PyTorch 2.1.1和Transformers 4.35.2进行了测试。此代码库预计兼容Python 3.8及更高版本,以及近期的PyTorch版本。Transformers的版本需为4.33.0或更高。
首先,安装该模型推理所需的代码:
pip install git+https://github.com/rinnakk/nue-asr.git
本模型提供命令行接口和Python接口。
命令行使用
以下命令使用命令行接口转录音频文件,音频文件将自动下采样到16kHz:
nue-asr audio1.wav
你可以指定多个音频文件:
nue-asr audio1.wav audio2.flac audio3.mp3
我们可以使用DeepSpeed-Inference来加速GPT - NeoX模块的推理速度。如果你使用DeepSpeed-Inference,则需要安装DeepSpeed:
pip install deepspeed
然后,你可以按如下方式使用DeepSpeed-Inference:
nue-asr --use-deepspeed audio1.wav
运行nue-asr --help
获取更多信息。
Python使用
Python接口示例如下:
import nue_asr
model = nue_asr.load_model("rinna/nue-asr")
tokenizer = nue_asr.load_tokenizer("rinna/nue-asr")
result = nue_asr.transcribe(model, tokenizer, "path_to_audio.wav")
print(result.text)
nue_asr.transcribe
函数除了接受音频文件路径外,还可以接受numpy.array
或torch.Tensor
格式的音频数据。
在Python接口中也可以使用DeepSpeed-Inference加速推理速度:
import nue_asr
model = nue_asr.load_model("rinna/nue-asr", use_deepspeed=True)
tokenizer = nue_asr.load_tokenizer("rinna/nue-asr")
result = nue_asr.transcribe(model, tokenizer, "path_to_audio.wav")
print(result.text)
✨ 主要特性
- 提出了一种新颖的端到端语音识别模型
Nue ASR
,集成了预训练的语音和语言模型。 - 模型名称
Nue
源自日语词汇(鵺/ぬえ/Nue
),这是日本传说中的生物之一(妖怪/ようかい/Yōkai
)。 - 提供端到端的日语语音识别,识别准确率可与近期的ASR模型相媲美。
- 在GPU上使用时,能够实现超实时的语音识别。
📚 详细文档
模型架构
该模型由三个主要组件组成:HuBERT音频编码器、桥接网络和GPT - NeoX解码器。HuBERT和GPT - NeoX的权重分别使用预训练的HuBERT和GPT - NeoX权重进行初始化:
训练
模型在约19,000小时的日语语音语料库ReazonSpeech v1上进行训练。请注意,训练前排除了时长超过16秒的语音样本:
贡献者
发布日期
2023年12月7日
分词
该模型使用与japanese-gpt-neox-3.6b相同的基于SentencePiece的分词器。
引用方式
@inproceedings{hono2024integrating,
title = {Integrating Pre-Trained Speech and Language Models for End-to-End Speech Recognition},
author = {Hono, Yukiya and Mitsuda, Koh and Zhao, Tianyu and Mitsui, Kentaro and Wakatsuki, Toshiaki and Sawada, Kei},
booktitle = {Findings of the Association for Computational Linguistics ACL 2024},
month = {8},
year = {2024},
pages = {13289--13305},
url = {https://aclanthology.org/2024.findings-acl.787}
}
@misc{rinna-nue-asr,
title = {rinna/nue-asr},
author = {Hono, Yukiya and Mitsuda, Koh and Zhao, Tianyu and Mitsui, Kentaro and Wakatsuki, Toshiaki and Sawada, Kei},
url = {https://huggingface.co/rinna/nue-asr}
}
参考文献
@inproceedings{sawada2024release,
title = {Release of Pre-Trained Models for the {J}apanese Language},
author = {Sawada, Kei and Zhao, Tianyu and Shing, Makoto and Mitsui, Kentaro and Kaga, Akio and Hono, Yukiya and Wakatsuki, Toshiaki and Mitsuda, Koh},
booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
month = {5},
year = {2024},
pages = {13898--13905},
url = {https://aclanthology.org/2024.lrec-main.1213},
note = {\url{https://arxiv.org/abs/2404.01657}}
}
@article{hsu2021hubert,
title = {{HuBERT}: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units},
author = {Hsu, Wei-Ning and Bolte, Benjamin and Tsai, Yao-Hung Hubert and Lakhotia, Kushal and Salakhutdinov, Ruslan and Mohamed, Abdelrahman},
journal = {IEEE/ACM Transactions on Audio, Speech, and Language Processing},
month = {10},
year = {2021},
volume = {29},
pages = {3451--3460},
doi = {10.1109/TASLP.2021.3122291}
}
@software{andoniangpt2021gpt,
title = {{GPT}-{N}eo{X}: Large Scale Autoregressive Language Modeling in {P}y{T}orch},
author = {Andonian, Alex and Anthony, Quentin and Biderman, Stella and Black, Sid and Gali, Preetham and Gao, Leo and Hallahan, Eric and Levy-Kramer, Josh and Leahy, Connor and Nestler, Lucas and Parker, Kip and Pieler, Michael and Purohit, Shivanshu and Songz, Tri and Phil, Wang and Weinbach, Samuel},
month = {8},
year = {2021},
version = {0.0.1},
doi = {10.5281/zenodo.5879544},
url = {https://www.github.com/eleutherai/gpt-neox}
}
@inproceedings{aminabadi2022deepspeed,
title = {{DeepSpeed-Inference}: enabling efficient inference of transformer models at unprecedented scale},
author = {Aminabadi, Reza Yazdani and Rajbhandari, Samyam and Awan, Ammar Ahmad and Li, Cheng and Li, Du and Zheng, Elton and Ruwase, Olatunji and Smith, Shaden and Zhang, Minjia and Rasley, Jeff and others},
booktitle = {SC22: International Conference for High Performance Computing, Networking, Storage and Analysis},
year = {2022},
pages = {1--15},
doi = {10.1109/SC41404.2022.00051}
}
📄 许可证



