T

Tango Full

Developed by declare-lab
TANGO is a latent diffusion model-based text-to-audio generation tool capable of producing realistic audio including human voices, animal sounds, and natural/artificial sound effects based on text prompts.
Downloads 15
Release Time : 5/30/2023

Model Overview

TANGO employs a frozen parameter instruction-tuned large language model Flan-T5 as text encoder, and trains a UNet-architecture diffusion model for audio generation. It surpasses current state-of-the-art audio generation models in both objective metrics and subjective evaluations.

Model Features

High-quality audio generation
Capable of generating realistic audio including human voices, animal sounds, and natural/artificial sound effects
Instruction-guided diffusion
Uses instruction-tuned large language model Flan-T5 as text encoder for precise text-to-audio conversion
Surpasses SOTA performance
Outperforms current state-of-the-art audio generation models in both objective metrics and subjective evaluations

Model Capabilities

Text-to-audio generation
Multi-category sound synthesis
High-quality audio rendering

Use Cases

Entertainment & Media
Sound effects production
Quickly generate high-quality sound effects for films, games and other content
Produces realistic environmental sound effects and special effect sounds
Education
Teaching assistance
Generate accompanying audio for educational content
Creates vivid teaching audio materials
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Ā© 2025AIbase