B

Bigvgan V2 24khz 100band 256x

Developed by nvidia
BigVGAN is a high-performance neural vocoder that achieves high-quality audio synthesis through large-scale training, supporting multiple sampling rates and frequency band configurations.
Downloads 34.03k
Release Time : 7/15/2024

Model Overview

BigVGAN is a universal neural vocoder capable of converting mel spectrograms into high-quality waveform audio. Through large-scale training and advanced architectural design, it achieves excellent audio generation results.

Model Features

Large-scale training
Trained on a diverse audio dataset including multilingual speech, environmental sounds, and musical instruments to enhance the model's generalization capability.
High-performance inference
Provides custom CUDA kernels supporting fused upsampling + activation operations, improving inference speed by 1.5-3x.
Multi-configuration support
Offers pre-trained models with various sampling rates (22kHz-44kHz) and frequency band configurations to adapt to different application scenarios.
Improved discriminator
Utilizes multi-scale sub-band CQT discriminators and multi-scale mel spectrogram loss training to enhance generation quality.

Model Capabilities

Mel spectrogram to waveform conversion
High-quality audio synthesis
Multi-sampling rate support
Fast inference

Use Cases

Speech synthesis
Text-to-speech systems
Serves as the backend vocoder for TTS systems, converting mel spectrograms into natural speech waveforms.
Generates high-quality, natural speech output
Audio enhancement
Audio super-resolution
Converts low-quality audio into high-quality waveforms.
Improves audio quality and clarity
Music generation
Musical instrument sound synthesis
Generates waveform sounds for various musical instruments.
High-quality musical instrument timbre synthesis
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase