sepformer_rescuespeech開源語音增強模型 - 為救援場景德語語音高效去噪

首頁

Sepformer Rescuespeech

由speechbrain開發

這是一個基於SepFormer架構的語音增強模型，專門針對救援場景中的德語語音進行去噪處理，在16kHz採樣率下表現優異。

音頻增強

PyTorch

德語開源協議:Apache-2.0 #救援語音增強 #SepFormer架構 #德語語音處理

下載量 62

發布時間 : 6/30/2023

模型概述

該模型使用SepFormer架構實現語音增強功能，先在Microsoft-DNS 4數據集上預訓練，後在救援語音數據集上微調，能有效提升嘈雜環境中的語音質量。

模型特點

救援場景優化

專門針對救援場景中的語音數據進行微調，提升在嘈雜環境中的語音增強效果

高性能架構

採用SepFormer架構，結合Transformer的優勢，實現高效的語音分離

多指標優化

在SI-SNR、SI-SDR和PESQ等多個語音質量評估指標上均有顯著提升

模型能力

語音去噪

語音增強

救援場景語音處理

使用案例

緊急救援

救援通信增強

在嘈雜的救援環境中提升語音通信質量

PESQ評分提升至2.24，SI-SNR提升7.849dB

語音處理

語音質量改善

對低質量語音進行增強處理

SI-SDR提升8.414dB

🚀 基於RescueSpeech數據集訓練的SepFormer語音增強模型（16k採樣頻率）

本倉庫提供了使用基於 SepFormer 架構、由 SpeechBrain 實現的模型進行語音增強（去噪）所需的所有工具。該模型首先在 Microsoft-DNS 4 數據集上進行預訓練，然後在 16k 採樣頻率的 RescueSpeech 數據集上進行微調。為了獲得更好的使用體驗，我們建議你進一步瞭解 SpeechBrain。以下是該模型在 RescueSpeech 測試集上的性能表現。

🚀 快速開始

本項目提供了使用基於 SepFormer 架構的模型進行語音增強（去噪）的工具。模型在特定數據集上訓練和微調，可在 RescueSpeech 測試集上達到一定性能。

✨ 主要特性

模型架構：採用 SepFormer 架構，由 SpeechBrain 實現。
訓練數據：先在 Microsoft-DNS 4 數據集上預訓練，再在 16k 採樣頻率的 RescueSpeech 數據集上微調。
性能指標：在 RescueSpeech 測試集上，Test-Set SI - SNRi 為 7.849，Test-Set SI - SDRi 為 8.414，Test-Set PESQ 為 2.24。

模型性能

發佈時間	測試集 SI - SNRi	測試集 SI - SDRi	測試集 PESQ
07 - 01 - 23	7.849	8.414	2.24

其中，SI - SNRi 和 SI - SDRi 分別表示 SI - SNR 和 SI - SDR 指標的提升情況。

📦 安裝指南

首先，請使用以下命令安裝 SpeechBrain：

pip install speechbrain

請注意，我們建議你閱讀我們的教程，進一步瞭解 SpeechBrain。

💻 使用示例

基礎用法

對自己的音頻文件進行語音增強：

from speechbrain.inference.separation import SepformerSeparation as separator
import torchaudio

model = separator.from_hparams(source="speechbrain/rescuespeech_sepformer", savedir='pretrained_models/rescuespeech_sepformer')

# 若使用自定義文件，請更改路徑
est_sources = model.separate_file(path='speechbrain/rescuespeech_sepformer/example_rescuespeech16k.wav') 

torchaudio.save("enhanced_rescuespeech16k.wav", est_sources[:, :, 0].detach().cpu(), 16000)

高級用法

在 GPU 上進行推理：在調用 from_hparams 方法時添加 run_opts={"device":"cuda"}。

你可以在此處找到我們的訓練結果（模型、日誌等）。

侷限性

SpeechBrain 團隊不對該模型在其他數據集上的性能提供任何保證。

📚 詳細文檔

引用 SpeechBrain

@misc{speechbrain,
  title={{SpeechBrain}: A General-Purpose Speech Toolkit},
  author={Mirco Ravanelli and Titouan Parcollet and Peter Plantinga and Aku Rouhe and Samuele Cornell and Loren Lugosch and Cem Subakan and Nauman Dawalatabad and Abdelwahab Heba and Jianyuan Zhong and Ju-Chieh Chou and Sung-Lin Yeh and Szu-Wei Fu and Chien-Feng Liao and Elena Rastorgueva and François Grondin and William Aris and Hwidong Na and Yan Gao and Renato De Mori and Yoshua Bengio},
  year={2021},
  eprint={2106.04624},
  archivePrefix={arXiv},
  primaryClass={eess.AS},
  note={arXiv:2106.04624}
}

引用 SepFormer

@inproceedings{subakan2021attention,
      title={Attention is All You Need in Speech Separation}, 
      author={Cem Subakan and Mirco Ravanelli and Samuele Cornell and Mirko Bronzi and Jianyuan Zhong},
      year={2021},
      booktitle={ICASSP 2021}
}

引用 RescueSpeech

@misc{sagar2023rescuespeech,
    title={RescueSpeech: A German Corpus for Speech Recognition in Search and Rescue Domain},
    author={Sangeet Sagar and Mirco Ravanelli and Bernd Kiefer and Ivana Kruijff Korbayova and Josef van Genabith},
    year={2023},
    eprint={2306.04054},
    archivePrefix={arXiv},
    primaryClass={eess.AS}
}

📄 許可證

本項目採用 Apache - 2.0 許可證。

關於 SpeechBrain

官網：https://speechbrain.github.io/
代碼倉庫：https://github.com/speechbrain/speechbrain/
HuggingFace：https://huggingface.co/speechbrain/

模型信息

屬性	詳情
模型類型	SepFormer
訓練數據	先在 Microsoft - DNS 4 數據集預訓練，後在 16k 採樣頻率的 RescueSpeech 數據集微調
評估指標	SI - SNR、PESQ、SDR
測試集性能	Test - Set SI - SNRi 為 7.849，Test - Set SI - SDRi 為 8.414，Test - Set PESQ 為 2.24