Tango 2 Open-source Text-to-Audio Model - Free Deployment to Generate High-quality Audio

Tango2

Developed by declare-lab

Tango 2 is an improved text-to-audio generation model based on Tango, optimizing audio generation quality through DPO alignment training

Audio Generation

Transformers

English#Text-to-Audio Diffusion Model #DPO Alignment Optimization #Multi-scenario Sound Effect Generation

Downloads 147

Release Time : 4/13/2024

Model Overview

Tango 2 is a diffusion-based text-to-audio generation model that aligns with human audio preferences using Direct Preference Optimization (DPO) technology, capable of generating high-quality audio content from text prompts

Model Features

DPO Alignment Training

Uses audio-alpaca dataset for direct preference optimization to enhance audio generation quality

High-Quality Audio Generation

Supports 100-200 step sampling to produce more natural and realistic audio

Batch Generation Capability

Can generate multiple audio samples simultaneously for multiple text prompts

Model Capabilities

Text-to-Audio Conversion

High-Quality Audio Generation

Batch Audio Generation

Use Cases

Sound Effect Production

Environmental Sound Generation

Generate natural environmental sounds based on text descriptions

Produces realistic environmental sounds like water flow, wind, etc.

Event Sound Effect Generation

Generate sound effects for specific events such as applause or cheers

Creates vivid sound effects matching scene descriptions

Media Production

Film/TV Score Generation

Generate background music based on scene descriptions

Produces music segments that match the scene atmosphere

🚀 Tango 2: Aligning Diffusion-based Text-to-Audio Generative Models through Direct Preference Optimization

We developed Tango 2 based on Tango for text-to-audio generation. Tango 2 was initialized with the Tango-full-ft checkpoint and trained using Direct Preference Optimization (DPO) on audio-alpaca, a pairwise text-to-audio preference dataset.

Read the paper

🚀 Quick Start

Download and Generate Audio

Download the Tango 2 model and generate audio from a text prompt:

import IPython
import soundfile as sf
from tango import Tango

tango = Tango("declare-lab/tango2")

prompt = "An audience cheering and clapping"
audio = tango.generate(prompt)
sf.write(f"{prompt}.wav", audio, samplerate=16000)
IPython.display.Audio(data=audio, rate=16000)

The model will be automatically downloaded and saved in cache. Subsequent runs will load the model directly from cache.

Adjust Sampling Steps

The generate function uses 100 steps by default to sample from the latent diffusion model. We recommend using 200 steps for generating better quality audios. This comes at the cost of increased run-time.

prompt = "Rolling thunder with lightning strikes"
audio = tango.generate(prompt, steps=200)
IPython.display.Audio(data=audio, rate=16000)

Generate Multiple Samples

Use the generate_for_batch function to generate multiple audio samples for a batch of text prompts:

prompts = [
    "A car engine revving",
    "A dog barks and rustles with some clicking",
    "Water flowing and trickling"
]
audios = tango.generate_for_batch(prompts, samples=2)

This will generate two samples for each of the three text prompts.

💻 Usage Examples

Basic Usage

import IPython
import soundfile as sf
from tango import Tango

tango = Tango("declare-lab/tango2")

prompt = "An audience cheering and clapping"
audio = tango.generate(prompt)
sf.write(f"{prompt}.wav", audio, samplerate=16000)
IPython.display.Audio(data=audio, rate=16000)

Advanced Usage

# Generate audio with more steps for better quality
prompt = "Rolling thunder with lightning strikes"
audio = tango.generate(prompt, steps=200)
IPython.display.Audio(data=audio, rate=16000)

# Generate multiple audio samples for a batch of text prompts
prompts = [
    "A car engine revving",
    "A dog barks and rustles with some clicking",
    "Water flowing and trickling"
]
audios = tango.generate_for_batch(prompts, samples=2)

📦 Installation

Our code is released here: https://github.com/declare-lab/tango

Please follow the instructions in the repository for installation, usage and experiments.

📄 License

This project is licensed under the CC BY-NC-SA 4.0 license.

📚 Documentation

Model Information

Property	Details
Model Type	Text-to-Audio Generative Model
Training Data	bjoernp/AudioCaps, declare-lab/audio_alpaca
Pipeline Tag	Text-to-Audio
Tags	Text-to-Audio

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご