Tangoflux
TangoFlux is an efficient text-to-audio generation system that combines flow matching and CLAP preference optimization technologies to quickly produce high-quality audio.
Downloads 727
Release Time : 12/24/2024
Model Overview
TangoFlux generates audio within 44.1kHz/30 seconds through the FluxTransformer module (including diffusion transformer and multimodal diffusion transformer), supporting text prompts and duration embeddings.
Model Features
Ultra-fast generation
Capable of generating high-quality audio in a short time, defaulting to 25 steps, with 50 steps recommended for higher quality.
High-fidelity audio
Supports 44.1kHz sampling rate, generating audio within 30 seconds while ensuring audio quality.
Multimodal support
Generates audio through text prompts and duration embeddings, supporting multimodal input.
Three-stage training process
Includes pre-training, fine-tuning, and preference optimization stages, utilizing the CRPO method to optimize model performance.
Model Capabilities
Text-to-audio generation
High-fidelity audio generation
Multimodal input processing
Use Cases
Creative content generation
Sound effect generation
Generates specific sound effects based on text descriptions, such as 'a hammer slowly hitting a wooden table'.
Produces high-quality audio files that match the description.
Multimedia applications
Background music generation
Generates background music for videos or games.
Produces background music that matches the scene.
Featured Recommended AI Models