Open-source Audio Source Separation Model - Precise Separation of 8kHz Sampling Rate Speech

Audio Source Separation

Developed by Awais

Audio source separation model trained on the Asteroid framework, optimized for speech separation tasks at 8kHz sampling rate

Sound Separation

PyTorch

#Dual-source separation #8kHz audio processing #ConvTasNet architecture

Downloads 30

Release Time : 4/2/2022

Model Overview

This model uses the ConvTasNet architecture and is trained for clean speech separation tasks on the Libri2Mix dataset, capable of separating different speaker voices from mixed audio

Model Features

Efficient separation

Adopts ConvTasNet architecture to achieve efficient speech separation at 8kHz sampling rate

Optimized training

Specially optimized based on the Libri2Mix dataset, suitable for clean speech separation scenarios

Lightweight

Moderate model parameter size, suitable for practical deployment applications

Model Capabilities

Dual-speaker speech separation

8kHz audio processing

Real-time audio source separation

Use Cases

Speech processing

Meeting recording enhancement

Separate different speaker voices in meeting recordings

SI-SDR improvement of 14.76dB

Speech recognition preprocessing

Provide cleaner single-speaker audio input for ASR systems

STOI improvement of 0.93

🚀 Asteroid model `Awais/Audio_Source_Separation`

This is an audio source separation model imported from Zenodo. It aims to separate different audio sources effectively, providing high - quality audio separation results.

🚀 Quick Start

This model is imported from Zenodo. You can use it for audio source separation tasks after proper configuration.

✨ Features

Trained on Specific Dataset: This model was trained by Joris Cosentino using the librimix recipe in Asteroid. It was trained on the sep_clean task of the Libri2Mix dataset.
Configurable Training: The training configuration can be adjusted according to different requirements, including data settings, filterbank parameters, masknet settings, optimization strategies, and training hyper - parameters.

📦 Installation

No specific installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

No code examples are provided in the original document, so this section is skipped.

📚 Documentation

Description

This model was trained by Joris Cosentino using the librimix recipe in Asteroid. It was trained on the sep_clean task of the Libri2Mix dataset.

Training Config

data:
    n_src: 2
    sample_rate: 8000
    segment: 3
    task: sep_clean
    train_dir: data/wav8k/min/train-360
    valid_dir: data/wav8k/min/dev
filterbank:
    kernel_size: 16
    n_filters: 512
    stride: 8
masknet:
    bn_chan: 128
    hid_chan: 512
    mask_act: relu
    n_blocks: 8
    n_repeats: 3
    skip_chan: 128
optim:
    lr: 0.001
    optimizer: adam
    weight_decay: 0.0
training:
    batch_size: 24
    early_stop: True
    epochs: 200
    half_lr: True
    num_workers: 2

Results

On Libri2Mix min test set:

si_sdr: 14.764543634468069
si_sdr_imp: 14.764029375607246
sdr: 15.29337970745095
sdr_imp: 15.114146605113111
sir: 24.092904661115366
sir_imp: 23.913669683141528
sar: 16.06055906916849
sar_imp: -51.980784441287454
stoi: 0.9311142440593033
stoi_imp: 0.21817376142710482

🔧 Technical Details

The model uses the ConvTasNet architecture and is trained on the Libri2Mix dataset. The training configuration is carefully designed to optimize the performance of audio source separation, including data pre - processing, filterbank design, masknet construction, and optimization strategies.

📄 License

This work "ConvTasNet_Libri2Mix_sepclean_8k" is a derivative of LibriSpeech ASR corpus by Vassil Panayotov, used under CC BY 4.0. "ConvTasNet_Libri2Mix_sepclean_8k" is licensed under [Attribution - ShareAlike 3.0 Unported](https://creativecommons.org/licenses/by - sa/3.0/) by Cosentino Joris.

Additional Information

Property	Details
Model Type	Asteroid model (`Awais/Audio_Source_Separation`)
Training Data	Libri2Mix (`sep_clean` task)
Tags	asteroid, audio, ConvTasNet, audio - to - audio

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご