🚀 基于ECAPA嵌入在UrbanSound8k上进行声音识别
本项目提供了使用SpeechBrain在UrbanSound8k上预训练的模型进行声音识别的所有必要工具。你可以在此处下载该数据集。此系统可以识别以下10种声音关键词:
dog_bark, children_playing, air_conditioner, street_music, gun_shot, siren, engine_idling, jackhammer, drilling, car_horn
为了获得更好的体验,我们建议你进一步了解SpeechBrain。该模型在测试集上的性能如下:
发布日期 |
单折准确率 (%) |
04 - 06 - 21 |
75.5 |
🚀 快速开始
本系统由一个与统计池化相结合的ECAPA模型组成。在此基础上,应用了一个使用分类交叉熵损失训练的分类器。
📦 安装指南
首先,请使用以下命令安装SpeechBrain:
pip install speechbrain
请注意,我们建议你阅读我们的教程,进一步了解SpeechBrain。
💻 使用示例
基础用法
import torchaudio
from speechbrain.inference.classifiers import EncoderClassifier
classifier = EncoderClassifier.from_hparams(source="speechbrain/urbansound8k_ecapa", savedir="pretrained_models/gurbansound8k_ecapa")
out_prob, score, index, text_lab = classifier.classify_file('speechbrain/urbansound8k_ecapa/dog_bark.wav')
print(text_lab)
该系统使用采样率为16kHz(单声道)的录音进行训练。当调用classify_file
时,代码会自动对音频进行归一化处理(即重采样和选择单声道)。如果你使用encode_batch
和classify_batch
,请确保输入张量符合预期的采样率。
高级用法
import torchaudio
from speechbrain.inference.classifiers import EncoderClassifier
classifier = EncoderClassifier.from_hparams(source="speechbrain/urbansound8k_ecapa", savedir="pretrained_models/gurbansound8k_ecapa", run_opts={"device":"cuda"})
out_prob, score, index, text_lab = classifier.classify_file('speechbrain/urbansound8k_ecapa/dog_bark.wav')
print(text_lab)
训练模型
该模型使用SpeechBrain (8cab8b0c) 进行训练。要从头开始训练,请按照以下步骤操作:
- 克隆SpeechBrain:
git clone https://github.com/speechbrain/speechbrain/
- 安装:
cd speechbrain
pip install -r requirements.txt
pip install -e .
- 运行训练:
cd recipes/UrbanSound8k/SoundClassification
python train.py hparams/train_ecapa_tdnn.yaml --data_folder=your_data_folder
你可以在此处找到我们的训练结果(模型、日志等)。
🔧 技术细节
本系统由一个与统计池化相结合的ECAPA模型组成。在此基础上,应用了一个使用分类交叉熵损失训练的分类器。
📄 许可证
本项目采用Apache-2.0许可证。
🔍 参考文献
引用ECAPA
author = {Brecht Desplanques and
Jenthe Thienpondt and
Kris Demuynck},
editor = {Helen Meng and
Bo Xu and
Thomas Fang Zheng},
title = {{ECAPA-TDNN:} Emphasized Channel Attention, Propagation and Aggregation
in {TDNN} Based Speaker Verification},
booktitle = {Interspeech 2020},
pages = {3830--3834},
publisher = {{ISCA}},
year = {2020},
}
引用UrbanSound
Author = {Salamon, J. and Jacoby, C. and Bello, J. P.},
Booktitle = {22nd {ACM} International Conference on Multimedia (ACM-MM'14)},
Month = {Nov.},
Pages = {1041--1044},
Title = {A Dataset and Taxonomy for Urban Sound Research},
Year = {2014}}
引用SpeechBrain
如果你在研究或业务中使用了SpeechBrain,请引用以下内容:
@misc{speechbrain,
title={{SpeechBrain}: A General-Purpose Speech Toolkit},
author={Mirco Ravanelli and Titouan Parcollet and Peter Plantinga and Aku Rouhe and Samuele Cornell and Loren Lugosch and Cem Subakan and Nauman Dawalatabad and Abdelwahab Heba and Jianyuan Zhong and Ju-Chieh Chou and Sung-Lin Yeh and Szu-Wei Fu and Chien-Feng Liao and Elena Rastorgueva and François Grondin and William Aris and Hwidong Na and Yan Gao and Renato De Mori and Yoshua Bengio},
year={2021},
eprint={2106.04624},
archivePrefix={arXiv},
primaryClass={eess.AS},
note={arXiv:2106.04624}
}
⚠️ 重要提示
SpeechBrain团队不对该模型在其他数据集上的性能提供任何保证。