Open-source Voice Separation Model of DPRNNTasNet-ks2_WHAM_sepclean - Accurately Achieve Clean Voice Separation

Dprnntasnet Ks2 WHAM Sepclean

Developed by mpariente

A speech separation model trained on the Asteroid framework using the WHAM! dataset, focusing on clean speech separation tasks.

Sound Separation

PyTorch

#Speech Separation #Dual-Channel Audio Processing #High-Fidelity Reconstruction

Downloads 512

Release Time : 3/2/2022

Model Overview

This model adopts the DPRNN architecture, specifically designed to separate clean speech signals from mixed audio, suitable for speech enhancement and separation tasks.

Model Features

Efficient Speech Separation

Utilizes the DPRNN architecture to effectively process long-sequence audio signals, achieving high-quality speech separation.

Low Sample Rate Support

Supports audio input at 8000Hz sampling rate, suitable for various speech processing scenarios.

Lightweight Design

Features a lightweight design with a kernel size of 2 and 64 filters, balancing performance and computational efficiency.

Model Capabilities

Audio Separation

Speech Enhancement

Multi-Speaker Separation

Use Cases

Speech Processing

Conference Recording Separation

Separate clear speech of individual speakers from multi-person conference recordings

SI-SDR improvement of 19.32dB

Speech Enhancement

Extract clear speech from noisy recordings

STOI improvement of 0.24

🚀 Asteroid model `mpariente/DPRNNTasNet-ks2_WHAM_sepclean`

This is an audio model imported from Zenodo, trained on the WHAM! dataset for audio separation tasks.

🚀 Quick Start

This model mpariente/DPRNNTasNet-ks2_WHAM_sepclean is imported from Zenodo. It was trained by Manuel Pariente using the wham/DPRNN recipe in Asteroid on the sep_clean task of the WHAM! dataset.

✨ Features

Trained on the sep_clean task of the WHAM! dataset.
Utilizes the wham/DPRNN recipe in Asteroid.

📚 Documentation

Description

This model was trained by Manuel Pariente using the wham/DPRNN recipe in Asteroid. It was trained on the sep_clean task of the WHAM! dataset.

Training config

data:
    mode: min
    nondefault_nsrc: None
    sample_rate: 8000
    segment: 2.0
    task: sep_clean
    train_dir: data/wav8k/min/tr
    valid_dir: data/wav8k/min/cv
filterbank:
    kernel_size: 2
    n_filters: 64
    stride: 1
main_args:
    exp_dir: exp/train_dprnn_new/
    gpus: -1
    help: None
masknet:
    bidirectional: True
    bn_chan: 128
    chunk_size: 250
    dropout: 0
    hid_size: 128
    hop_size: 125
    in_chan: 64
    mask_act: sigmoid
    n_repeats: 6
    n_src: 2
    out_chan: 64
optim:
    lr: 0.001
    optimizer: adam
    weight_decay: 1e-05
positional arguments:
training:
    batch_size: 3
    early_stop: True
    epochs: 200
    gradient_clipping: 5
    half_lr: True
    num_workers: 8

Results

si_sdr: 19.316743490695334
si_sdr_imp: 19.317895273889842
sdr: 19.68085347190952
sdr_imp: 19.5298092932871
sir: 30.362213998701232
sir_imp: 30.21116982007881
sar: 20.15553251343315
sar_imp: -129.02091762351188
stoi: 0.97772664309074
stoi_imp: 0.23968091518217424

License notice

This work "DPRNNTasNet-ks2_WHAM_sepclean" is a derivative of CSR-I (WSJ0) Complete by LDC, used under LDC User Agreement for Non - Members (Research only). "DPRNNTasNet-ks2_WHAM_sepclean" is licensed under Attribution-ShareAlike 3.0 Unported by Manuel Pariente.

📄 License

This model is licensed under cc-by-sa-4.0.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご