Higgs Audio V2 Generation 3B Base
H

Higgs Audio V2 Generation 3B Base

Developed by bosonai
Higgs Audio V2 is a powerful audio foundation model that has been pre-trained on over 10 million hours of audio data and diverse text data, capable of generating highly expressive audio.
Downloads 515
Release Time : 7/1/2025

Model Overview

Higgs Audio V2 is an audio generation model focused on generating highly expressive audio, supporting multilingual and various audio tasks.

Model Features

Highly expressive audio generation
The model excels in generating highly expressive audio and can automatically adapt to prosody and emotion.
Multilingual support
Capable of zero-shot generation of natural multi-speaker dialogues in multiple languages.
Advanced performance
Achieved excellent results in multiple benchmark tests, surpassing several well-known models.
Unique capabilities
Has the capabilities of automatically adapting to prosody, zero-shot generation of melody humming, and simultaneous generation of speech and background music.

Model Capabilities

Text-to-speech conversion
Multilingual dialogue generation
Melody humming generation
Simultaneous generation of speech and background music
Emotional speech generation

Use Cases

Speech generation
Emotional speech generation
Generate speech with rich emotions
Surpassed gpt-4o-mini-tts with a win rate of 75.7% in the 'Emotion' category of EmergentTTS-Eval
Multilingual dialogue generation
Generate natural multi-speaker dialogues
Performed excellently in the multi-speaker evaluation benchmark
Music generation
Melody humming generation
Generate melody humming with cloned voice in zero-shot
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase