TIGER-DnR Open-Source Lightweight Speech Separation Model - Efficiently Achieve Clear Audio Processing and Separation

TIGER DnR

Developed by JusperLee

TIGER is a lightweight speech separation model that achieves efficient audio processing through frequency band segmentation and multi-scale feature extraction

Sound Separation

Safetensors

EnglishOpen Source License:Apache-2.0 #Lightweight speech separation #Multi-scale frequency band modeling #Low-parameter efficiency

Downloads 134

Release Time : 1/22/2025

Model Overview

TIGER is an efficient speech separation model that employs frequency band segmentation and interleaved modeling architecture, significantly reducing computational costs while maintaining high performance. Primarily used for speech separation, noise reduction, and reverberation elimination tasks.

Model Features

Efficient frequency band segmentation

Divides frequency bands using prior knowledge and compresses frequency information, significantly reducing computational costs

Multi-scale feature extraction

Utilizes multi-scale selective attention (MSA) modules to effectively extract contextual features

Lightweight design

Reduces parameter count by 94.3% and MACs by 95.3% while maintaining high performance

Real-world scenario adaptation

Performs exceptionally well on the EchoSet dataset containing complex noise and reverberation

Model Capabilities

Speech separation

Background noise elimination

Reverberation elimination

Multi-speaker speech separation

Use Cases

Speech enhancement

Meeting recording enhancement

Separates clear individual speech from recordings with multiple people speaking simultaneously

Outperforms the TF-GridNet model on the EchoSet dataset

Noisy environment speech processing

Eliminates background noise and reverberation to improve speech clarity

Effectively handles real-world reverberation influenced by object occlusion and material properties

Audio post-production

Film and TV audio restoration

Separates and enhances target speech from field recordings

🚀 Apollo: Band-sequence Modeling for High-Quality Audio Restoration

A lightweight model for speech separation which effectively extracts key acoustic features through frequency band-split, multi-scale and full-frequency-frame modeling.

🚀 Quick Start

Test with Pre-trained Model

# Test using speech
python inference_speech.py --audio_path test/mix.wav

# Test using DnR
python inference_dnr.py --audio_path test/test_mixture_466.wav

Train with EchoSet

python audio_train.py --conf_dir configs/tiger.yml

Evaluate with EchoSet

python audio_test.py --conf_dir configs/tiger.yml

✨ Features

TIGER is a lightweight model for speech separation which effectively extracts key acoustic features through frequency band-split, multi-scale and full-frequency-frame modeling.
We release the code and pre-trained model of TIGER! 🚀
We release the TIGER model and the EchoSet dataset! 🚀

📚 Documentation

📜 Abstract

In this paper, we propose a speech separation model with significantly reduced parameter size and computational cost: Time-Frequency Interleaved Gain Extraction and Reconstruction Network (TIGER). TIGER leverages prior knowledge to divide frequency bands and applies compression on frequency information. We employ a multi-scale selective attention (MSA) module to extract contextual features, while introducing a full-frequency-frame attention (F^3A) module to capture both temporal and frequency contextual information. Additionally, to more realistically evaluate the performance of speech separation models in complex acoustic environments, we introduce a novel dataset called EchoSet. This dataset includes noise and more realistic reverberation (e.g., considering object occlusions and material properties), with speech from two speakers overlapping at random proportions. Experimental results demonstrated that TIGER significantly outperformed state-of-the-art (SOTA) model TF-GridNet on the EchoSet dataset in both inference speed and separation quality, while reducing the number of parameters by 94.3% and the MACs by 95.3%. These results indicate that by utilizing frequency band-split and interleaved modeling structures, TIGER achieves a substantial reduction in parameters and computational costs while maintaining high performance. Notably, TIGER is the first speech separation model with fewer than 1 million parameters that achieves performance close to the SOTA model.

📖 Citation

@article{xu2024tiger,
  title={TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation},
  author={Xu, Mohan and Li, Kai and Chen, Guo and Hu, Xiaolin},
  journal={arXiv preprint arXiv:2410.01469},
  year={2024}
}

📄 License

This project is licensed under the Apache 2.0 license.

📧 Contact

If you have any questions, please feel free to contact us via tsinghua.kaili@gmail.com.

Mohan Xu^*, Kai Li^*, Guo Chen, Xiaolin Hu
Tsinghua University, Beijing, China
^*Equal contribution
📜 ICLR 2025 | 🎶 Demo | 🤗 Dataset

> TIGER is a lightweight model for speech separation which effectively extracts key acoustic features through frequency band-split, multi-scale and full-frequency-frame modeling.

💥 News

[2025-01-23] We release the code and pre-trained model of TIGER! 🚀
[2025-01-23] We release the TIGER model and the EchoSet dataset! 🚀

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご