T

Tangoflux

Developed by declare-lab
TangoFlux is an efficient text-to-audio generation system that combines flow matching and CLAP preference optimization technologies to quickly produce high-quality audio.
Downloads 727
Release Time : 12/24/2024

Model Overview

TangoFlux generates audio within 44.1kHz/30 seconds through the FluxTransformer module (including diffusion transformer and multimodal diffusion transformer), supporting text prompts and duration embeddings.

Model Features

Ultra-fast generation
Capable of generating high-quality audio in a short time, defaulting to 25 steps, with 50 steps recommended for higher quality.
High-fidelity audio
Supports 44.1kHz sampling rate, generating audio within 30 seconds while ensuring audio quality.
Multimodal support
Generates audio through text prompts and duration embeddings, supporting multimodal input.
Three-stage training process
Includes pre-training, fine-tuning, and preference optimization stages, utilizing the CRPO method to optimize model performance.

Model Capabilities

Text-to-audio generation
High-fidelity audio generation
Multimodal input processing

Use Cases

Creative content generation
Sound effect generation
Generates specific sound effects based on text descriptions, such as 'a hammer slowly hitting a wooden table'.
Produces high-quality audio files that match the description.
Multimedia applications
Background music generation
Generates background music for videos or games.
Produces background music that matches the scene.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase