B

Bigvgan V2 22khz 80band Fmax8k 256x

Developed by nvidia
BigVGAN is a large-scale trained universal neural vocoder capable of high-quality mel-spectrogram to waveform conversion. The v2 version accelerates inference through custom CUDA kernels and expands training data diversity.
Downloads 1,285
Release Time : 7/15/2024

Model Overview

BigVGAN is a high-performance neural vocoder that achieves high-quality audio synthesis through adversarial training. It supports multiple sampling rates and band configurations, suitable for generating speech, music, and environmental sound effects.

Model Features

Custom CUDA kernel acceleration
Provides fused upsampling+activation CUDA kernels, improving inference speed by 1.5-3x
Multi-scale discriminator
Utilizes multi-scale subband CQT discriminators and mel-spectrogram loss to enhance audio quality
Diverse training data
Training set covers multiple audio types including multilingual speech, environmental sound effects, and instrument sounds
High upsampling rate
Supports up to 512x upsampling rate, capable of generating 44kHz high sampling rate audio

Model Capabilities

Mel-spectrogram to waveform conversion
High-quality speech synthesis
Music audio generation
Environmental sound synthesis

Use Cases

Speech synthesis
Text-to-speech system
Serves as the vocoder component in TTS pipelines to convert mel-spectrograms into natural speech
Achieves SOTA results on LibriTTS dataset
Audio enhancement
Low-quality audio restoration
Improves clarity of low-quality recordings through mel-spectrogram reconstruction techniques
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase