Sepformer-DNS4-16k Enhancement Open-source Speech Enhancement Model

Sepformer Dns4 16k Enhancement

Developed by speechbrain

This is a speech enhancement model based on the SepFormer architecture, specifically designed for denoising tasks. It was trained on the Microsoft DNS-4 dataset and supports audio processing at a 16kHz sampling rate.

Audio Enhancement

PyTorch

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Speech Denoising #Real-time Enhancement #Multilingual Support

Downloads 1,669

Release Time : 8/6/2023

Model Overview

The model utilizes the SepFormer architecture to achieve speech enhancement, primarily for removing background noise and improving speech quality. It was trained on 1300 hours of the Microsoft DNS 4 dataset and is suitable for audio with a 16kHz sampling rate.

Model Features

High-performance Denoising

Excellent performance on the DNS4 2022 baseline development set, with DNSMOS SIG score of 2.999, BAK score of 3.076, and OVRL score of 2.437

Multilingual Support

Supports multiple languages including English, German, Russian, French, Italian, and Spanish

Transformer-based Architecture

Utilizes the advanced SepFormer architecture, combining the advantages of Transformer for speech separation and enhancement

Model Capabilities

Audio Denoising

Speech Quality Enhancement

Background Noise Suppression

Use Cases

Voice Communication

VoIP Call Enhancement

Improves the quality of internet voice calls by reducing background noise interference

Significantly improves call clarity

Audio Post-processing

Recording Denoising

Reduces noise in field recordings to improve speech intelligibility

Enhances recording quality, making speech clearer

🚀 SepFormer trained on Microsoft DNS-4 (Deep Noise Suppression Challenge 4 – ICASSP 2022) for speech enhancement (16k sampling frequency)

This repository offers all the essential tools for speech enhancement (denoising) using a SepFormer model implemented with SpeechBrain. The model is trained on 1300HRS of the Microsoft-DNS 4 dataset at a 16k sampling frequency. For a better experience, we recommend learning more about SpeechBrain. The evaluation results on the DNS4 2022 baseline dev set using DNSMOS are as follows:

✨ Features

Multilingual Support: Supports languages such as English, German, Russian, French, Italian, and Spanish.
Audio-to-Audio Task: Specialized in audio-to-audio tasks, particularly speech enhancement.
Trained on Specific Datasets: Trained on the Microsoft DNS-4 dataset for deep noise suppression.
Multiple Metrics Evaluation: Evaluated using metrics like SI-SNR, PESQ, SIG, BAK, and OVRL.

📦 Installation

First of all, please install SpeechBrain with the following command:

pip install speechbrain

Please notice that we encourage you to read our tutorials and learn more about SpeechBrain.

💻 Usage Examples

Basic Usage

from speechbrain.inference.separation import SepformerSeparation as separator
import torchaudio

model = separator.from_hparams(source="speechbrain/sepformer-dns4-16k-enhancement", savedir='pretrained_models/sepformer-dns4-16k-enhancement')

# for custom file, change path
est_sources = model.separate_file(path='speechbrain/sepformer-dns4-16k-enhancement/example_dns4-16k.wav') 

torchaudio.save("enhanced_dns4-16k.wav", est_sources[:, :, 0].detach().cpu(), 16000)

Advanced Usage

To perform inference on the GPU, add run_opts={"device":"cuda"} when calling the from_hparams method.

📚 Documentation

Model Evaluation Results

Release	SIG	BAK	OVRL
08 - 01 - 23	2.999	3.076	2.437

DNSMOS - deep noise suppression (DNS)- mean opinion score (MOS) is a non - intrusive evaluation metric. It computes 3 scores– SIG (speech quality), BAK (background noise quality), and OVRL (overall quality) each on a scale of 1 to 5, with 5 being the best quality.

Model Index

Property	Details
Model Name	sepformer-dns4-16k-enhancement
Task	Speech Enhancement
Dataset	DNS - 4 (deep - noise - suppression - challenge - icassp - 2022)
Split	baseline - dev - set
Language	de
DNSMOS SIG	2.999
DNSMOS BAK	3.076
DNSMOS OVRL	2.437

Limitations

The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.

🔧 Technical Details

The model is a SepFormer implemented with SpeechBrain and trained on 1300HRS of the Microsoft - DNS 4 dataset with a 16k sampling frequency.

📄 License

This project is licensed under the "apache - 2.0" license.

Referencing

Referencing SpeechBrain

@misc{speechbrain,
  title={{SpeechBrain}: A General-Purpose Speech Toolkit},
  author={Mirco Ravanelli and Titouan Parcollet and Peter Plantinga and Aku Rouhe and Samuele Cornell and Loren Lugosch and Cem Subakan and Nauman Dawalatabad and Abdelwahab Heba and Jianyuan Zhong and Ju-Chieh Chou and Sung-Lin Yeh and Szu-Wei Fu and Chien-Feng Liao and Elena Rastorgueva and François Grondin and William Aris and Hwidong Na and Yan Gao and Renato De Mori and Yoshua Bengio},
  year={2021},
  eprint={2106.04624},
  archivePrefix={arXiv},
  primaryClass={eess.AS},
  note={arXiv:2106.04624}
}

Referencing SepFormer

@inproceedings{subakan2021attention,
      title={Attention is All You Need in Speech Separation}, 
      author={Cem Subakan and Mirco Ravanelli and Samuele Cornell and Mirko Bronzi and Jianyuan Zhong},
      year={2021},
      booktitle={ICASSP 2021}
}

Referencing ICASSP 2022 Deep Noise Suppression Challenge

@inproceedings{dubey2022icassp,
  title={ICASSP 2022 Deep Noise Suppression Challenge},
  author={Dubey, Harishchandra and Gopal, Vishak and Cutler, Ross and Matusevych, Sergiy and Braun, Sebastian and Eskimez, Emre Sefik and Thakker, Manthan and Yoshioka, Takuya and Gamper, Hannes and Aichner, Robert},
  booktitle={ICASSP},
  year={2022}
}

About SpeechBrain

Website: https://speechbrain.github.io/
Code: https://github.com/speechbrain/speechbrain/
HuggingFace: https://huggingface.co/speechbrain/

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご