B

Bigvgan V2 44khz 128band 256x

Developed by nvidia
BigVGAN is a large-scale trained universal neural vocoder capable of high-quality conversion from mel-spectrograms to waveform audio.
Downloads 367
Release Time : 7/15/2024

Model Overview

BigVGAN is a high-performance neural vocoder that achieves high-quality audio synthesis through large-scale training, supporting various sampling rates and band configurations.

Model Features

Large-scale training
Trained with large-scale diverse audio data, including multilingual speech, environmental sounds, and musical instruments
High-performance synthesis
Provides high-quality audio synthesis, supporting up to 44kHz sampling rate and 512x upsampling rate
Custom CUDA kernels
Offers fused upsampling+activation CUDA kernels that can accelerate inference speed by 1.5-3x
Improved discriminator
Utilizes multi-scale sub-band CQT discriminator and multi-scale mel-spectrogram loss to enhance generation quality

Model Capabilities

High-quality audio synthesis
Mel-spectrogram to waveform conversion
Multi-sample rate support
Fast inference (using CUDA kernels)

Use Cases

Speech synthesis
TTS backend vocoder
Serves as the backend vocoder for text-to-speech systems, converting mel-spectrograms into natural speech
High-quality speech output
Audio enhancement
Low-quality audio reconstruction
Reconstructs high-quality waveforms from compressed or low-quality audio
Improved audio quality
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase