urbansound8k_ecapa開源聲音識別模型 - 免費部署識別10種城市環境聲音

首頁

Urbansound8k Ecapa

由speechbrain開發

這是一個使用SpeechBrain框架在UrbanSound8k數據集上預訓練的聲音識別模型，能夠識別10種城市環境聲音。

音頻分類

PyTorch

英語開源協議:Apache-2.0 #城市聲音分類 #ECAPA-TDNN模型 #高精度音頻識別

下載量 91

發布時間 : 3/2/2022

模型概述

該系統由ECAPA模型結合統計池化組成，頂部應用分類器進行聲音分類，主要用於城市環境聲音識別任務。

模型特點

ECAPA-TDNN架構

採用強調通道注意力、傳播和聚合的TDNN架構，在說話人驗證任務中表現出色

高效分類

能夠快速準確地識別10種城市環境聲音

預訓練模型

提供在UrbanSound8k數據集上預訓練的模型，可直接使用

模型能力

環境聲音識別

音頻分類

聲音特徵提取

使用案例

城市環境監測

噪聲汙染監測

識別城市中的各種噪聲源，如電鑽聲、汽車喇叭聲等

準確率75.5%

智能家居

異常聲音檢測

檢測家庭環境中的異常聲音，如狗吠聲、兒童玩耍聲等

🚀 基於ECAPA嵌入在UrbanSound8k上進行聲音識別

本項目提供了使用SpeechBrain在UrbanSound8k上預訓練的模型進行聲音識別的所有必要工具。你可以在此處下載該數據集。此係統可以識別以下10種聲音關鍵詞：

dog_bark, children_playing, air_conditioner, street_music, gun_shot, siren, engine_idling, jackhammer, drilling, car_horn

為了獲得更好的體驗，我們建議你進一步瞭解SpeechBrain。該模型在測試集上的性能如下：

發佈日期	單折準確率 (%)
04 - 06 - 21	75.5

🚀 快速開始

本系統由一個與統計池化相結合的ECAPA模型組成。在此基礎上，應用了一個使用分類交叉熵損失訓練的分類器。

📦 安裝指南

首先，請使用以下命令安裝SpeechBrain：

pip install speechbrain

請注意，我們建議你閱讀我們的教程，進一步瞭解SpeechBrain。

💻 使用示例

基礎用法

import torchaudio
from speechbrain.inference.classifiers import EncoderClassifier
classifier = EncoderClassifier.from_hparams(source="speechbrain/urbansound8k_ecapa", savedir="pretrained_models/gurbansound8k_ecapa")
out_prob, score, index, text_lab = classifier.classify_file('speechbrain/urbansound8k_ecapa/dog_bark.wav')
print(text_lab)

該系統使用採樣率為16kHz（單聲道）的錄音進行訓練。當調用classify_file時，代碼會自動對音頻進行歸一化處理（即重採樣和選擇單聲道）。如果你使用encode_batch和classify_batch，請確保輸入張量符合預期的採樣率。

高級用法

# 在GPU上進行推理，在調用 from_hparams 方法時添加 run_opts={"device":"cuda"}
import torchaudio
from speechbrain.inference.classifiers import EncoderClassifier
classifier = EncoderClassifier.from_hparams(source="speechbrain/urbansound8k_ecapa", savedir="pretrained_models/gurbansound8k_ecapa", run_opts={"device":"cuda"})
out_prob, score, index, text_lab = classifier.classify_file('speechbrain/urbansound8k_ecapa/dog_bark.wav')
print(text_lab)

訓練模型

該模型使用SpeechBrain (8cab8b0c) 進行訓練。要從頭開始訓練，請按照以下步驟操作：

克隆SpeechBrain：

git clone https://github.com/speechbrain/speechbrain/

安裝：

cd speechbrain
pip install -r requirements.txt
pip install -e .

運行訓練：

cd recipes/UrbanSound8k/SoundClassification
python train.py hparams/train_ecapa_tdnn.yaml --data_folder=your_data_folder

你可以在此處找到我們的訓練結果（模型、日誌等）。

🔧 技術細節

本系統由一個與統計池化相結合的ECAPA模型組成。在此基礎上，應用了一個使用分類交叉熵損失訓練的分類器。

📄 許可證

本項目採用Apache-2.0許可證。

🔍 參考文獻

引用ECAPA

  author    = {Brecht Desplanques and
               Jenthe Thienpondt and
               Kris Demuynck},
  editor    = {Helen Meng and
               Bo Xu and
               Thomas Fang Zheng},
  title     = {{ECAPA-TDNN:} Emphasized Channel Attention, Propagation and Aggregation
               in {TDNN} Based Speaker Verification},
  booktitle = {Interspeech 2020},
  pages     = {3830--3834},
  publisher = {{ISCA}},
  year      = {2020},
}

引用UrbanSound

    Author = {Salamon, J. and Jacoby, C. and Bello, J. P.},
    Booktitle = {22nd {ACM} International Conference on Multimedia (ACM-MM'14)},
    Month = {Nov.},
    Pages = {1041--1044},
    Title = {A Dataset and Taxonomy for Urban Sound Research},
    Year = {2014}}

引用SpeechBrain

如果你在研究或業務中使用了SpeechBrain，請引用以下內容：

@misc{speechbrain,
  title={{SpeechBrain}: A General-Purpose Speech Toolkit},
  author={Mirco Ravanelli and Titouan Parcollet and Peter Plantinga and Aku Rouhe and Samuele Cornell and Loren Lugosch and Cem Subakan and Nauman Dawalatabad and Abdelwahab Heba and Jianyuan Zhong and Ju-Chieh Chou and Sung-Lin Yeh and Szu-Wei Fu and Chien-Feng Liao and Elena Rastorgueva and François Grondin and William Aris and Hwidong Na and Yan Gao and Renato De Mori and Yoshua Bengio},
  year={2021},
  eprint={2106.04624},
  archivePrefix={arXiv},
  primaryClass={eess.AS},
  note={arXiv:2106.04624}
}