DPTNet_Libri1Mix_enhsingle_16k Open-source Audio Enhancement Model

Home

Dptnet Libri1Mix Enhsingle 16k

Developed by JorisCos

Audio enhancement model trained based on the Asteroid framework, focusing on mono speech enhancement tasks

Audio Enhancement

PyTorch

#Mono speech enhancement #16kHz audio processing #DPTNet architecture

Downloads 4,446

Release Time : 3/2/2022

Model Overview

This model uses the DPTNet architecture and is trained on the enhanced mono task of the Libri1Mix dataset, aiming to improve the clarity and intelligibility of mono audio.

Model Features

Efficient audio processing

Trained with a 16kHz sampling rate and 3-second segment length, suitable for real-time audio processing scenarios

Deep time-frequency transformation network

Uses the DPTNet architecture, combining time-frequency transformation and deep neural networks for audio feature learning

Significant performance improvement

Achieves an SI-SDR improvement of 11.38dB and an STOI improvement of 0.13 on the test set

Model Capabilities

Mono speech enhancement

Audio quality improvement

Speech clarity enhancement

Use Cases

Voice communication

Voice call quality enhancement

Improves voice call clarity in noisy environments

SI-SDR improvement of 11.38dB, significantly improving speech intelligibility

Audio post-processing

Recording quality restoration

Enhances low-quality recordings

STOI improved to 0.93, approaching the level of original clear speech

🚀 Asteroid model `JorisCos/DPTNet_Libri1Mix_enhsignle_16k`

This model is designed for audio enhancement tasks, specifically trained on the enh_single task of the Libri1Mix dataset. It offers high - quality audio processing capabilities and is trained using the librimix recipe in Asteroid.

🚀 Quick Start

This section provides an overview of the model, its training details, and performance results.

✨ Features

Audio Enhancement: Trained for the enh_single task on the Libri1Mix dataset, it can effectively enhance single - source audio.
Based on Asteroid: Utilizes the librimix recipe in the Asteroid framework for training.

📚 Documentation

Model Description

This model was trained by Joris Cosentino using the librimix recipe in Asteroid. It was trained on the enh_single task of the Libri1Mix dataset.

Training Configuration

data:
  n_src: 1
  sample_rate: 16000
  segment: 3
  task: enh_single
  train_dir: data/wav16k/min/train-360
  valid_dir: data/wav16k/min/dev
filterbank:
  kernel_size: 16
  n_filters: 64
  stride: 8
masknet:
  bidirectional: true
  chunk_size: 100
  dropout: 0
  ff_activation: relu
  ff_hid: 256
  hop_size: 50
  in_chan: 64
  mask_act: sigmoid
  n_repeats: 2
  n_src: 1
  norm_type: gLN
  out_chan: 64
optim:
  lr: 0.001
  optimizer: adam
  weight_decay: 1.0e-05
scheduler:
  d_model: 64
  steps_per_epoch: 10000
training:
  batch_size: 4
  early_stop: true
  epochs: 200
  gradient_clipping: 5
  half_lr: true
  num_workers: 4

Results

On Libri1Mix min test set:

si_sdr: 14.829670037349064
si_sdr_imp: 11.379888731489366
sdr: 15.395712644737149
sdr_imp: 11.893049845524112
sir: Infinity
sir_imp: NaN
sar: 15.395712644737149
sar_imp: 11.893049845524112
stoi: 0.9301948391058859
stoi_imp: 0.13427501556534832

📄 License

This work "DPTNet_Libri1Mix_enhsignle_16k" is a derivative of LibriSpeech ASR corpus by Vassil Panayotov, used under CC BY 4.0; of The WSJ0 Hipster Ambient Mixtures dataset by Whisper.ai, used under [CC BY - NC 4.0](https://creativecommons.org/licenses/by - nc/4.0/) (Research only). "DPTNet_Libri1Mix_enhsignle_16k" is licensed under [Attribution - ShareAlike 3.0 Unported](https://creativecommons.org/licenses/by - sa/3.0/) by Joris Cosentino.

Property	Details
Tags	asteroid, audio, DPTNet, audio - to - audio
Datasets	Libri1Mix, enh_single
License	cc - by - sa - 4.0

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご