B

Bigvgan 24khz 100band

Developed by nvidia
BigVGAN is a high-performance neural vocoder that achieves high-quality audio generation through large-scale training, supporting multiple sample rates and frequency band configurations.
Downloads 273
Release Time : 7/15/2024

Model Overview

BigVGAN is a universal neural vocoder capable of generating high-quality audio waveforms from mel spectrograms. It achieves efficient audio synthesis through large-scale training and optimized architecture.

Model Features

Large-scale training
Trained using a large-scale mixed dataset covering various audio types, including multilingual speech, environmental sounds, and musical instruments.
High-performance inference
Provides custom CUDA kernels supporting fused upsampling + activation operations, improving inference speed by 1.5 to 3 times.
Multiple configuration support
Offers pre-trained models with various sample rates (22kHz, 24kHz, 44kHz) and frequency band configurations to suit different application scenarios.
Improved discriminator and loss functions
Utilizes multi-scale sub-band CQT discriminators and multi-scale mel spectrogram loss for training, enhancing generation quality.

Model Capabilities

High-quality audio generation
Mel spectrogram to waveform conversion
Multi-sample rate support
Fast inference

Use Cases

Speech synthesis
Text-to-speech systems
Serves as the backend vocoder for TTS systems, converting mel spectrograms into natural speech waveforms.
Generates high-quality, natural speech output
Audio enhancement
Audio super-resolution
Converts low-quality audio into high-quality waveforms.
Improves audio quality and clarity
Music generation
Music synthesis
Generates musical instrument sounds and environmental sounds.
High-quality music clip generation
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase