B

Bigvgan V2 44khz 128band 512x

Developed by nvidia
BigVGAN is a universal neural vocoder based on large-scale training, capable of generating high-quality audio waveforms.
Downloads 223.13k
Release Time : 7/15/2024

Model Overview

BigVGAN is a high-performance neural vocoder that achieves universal audio generation through large-scale training, supporting multiple sample rates and upsampling configurations.

Model Features

Large-scale training
Trained with a large-scale dataset containing various audio types, covering multilingual speech, environmental sounds, and musical instruments.
High-performance inference
Provides fused CUDA kernels, achieving 1.5x to 3x speedup on a single A100 GPU.
Multiple configuration support
Offers pre-trained checkpoints for various audio configurations, supporting up to 44 kHz sample rate and 512x upsampling.
Improved discriminator
Utilizes multi-scale subband CQT discriminators and multi-scale mel spectrogram loss for training.

Model Capabilities

High-quality audio generation
Mel spectrogram to waveform conversion
Multi-sample rate support
Fast inference

Use Cases

Speech synthesis
Text-to-speech systems
Serves as the backend vocoder for TTS systems, converting mel spectrograms into natural speech waveforms.
Generates high-quality, natural speech output
Audio enhancement
Audio super-resolution
Enhances the sample rate and audio quality of low-quality audio.
Generates high-fidelity audio output
Music generation
Musical instrument synthesis
Generates audio waveforms for various musical instruments.
Produces high-quality musical instrument sounds
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase