Bigvgan Melspec
A neural vocoder based on BigVGAN, trained with specific mel-spectrogram inputs, suitable for high-quality audio generation tasks
Downloads 16
Release Time : 1/11/2025
Model Overview
This model is an improved version of NVIDIA's BigVGAN, optimized for specific mel-spectrogram inputs, primarily used for audio-to-audio conversion tasks, capable of generating high-quality audio output.
Model Features
Optimized Mel-Spectrogram Input
Uses specifically configured mel-spectrograms as input, potentially improving audio generation quality
High PESQ Score
Achieved a PESQ score of 4.340 in evaluation, close to the original NVIDIA checkpoint's score of 4.362
Compatible with Various Mel-Spectrogram Configurations
Supports mel-spectrogram features generated by the vocos library
Model Capabilities
Audio generation
Mel-spectrogram conversion
High-quality speech synthesis
Use Cases
Speech Synthesis
Text-to-Speech Systems
Used as a neural vocoder for the backend of TTS systems
Generates high-quality speech output
Audio Enhancement
Speech Quality Improvement
Used to enhance the clarity and naturalness of low-quality audio
Featured Recommended AI Models
Š 2025AIbase