Tango2-full Open-source Text-to-Speech Model - Free Deployment for High-quality Voice Content Generation

Tango2 Full

Developed by declare-lab

Tango 2 is an improved text-to-audio generation model based on Tango, achieving alignment training for audio generation through Direct Preference Optimization (DPO) technology

Audio Generation

Transformers

English#Text-to-Audio Generation #Diffusion Model Optimization #Preference Alignment Training

Downloads 63

Release Time : 4/13/2024

Model Overview

Tango 2 is a diffusion-based text-to-audio generation model. Building upon the Tango-full-ft checkpoint, it undergoes DPO alignment training using the Audio-alpaca paired text-audio preference dataset, capable of generating high-quality audio based on text descriptions

Model Features

Direct Preference Optimization (DPO)

Uses DPO technology for alignment training to improve the quality of generated audio and its match with text descriptions

Expanded Training Dataset

Trained on an extended version of the Audio-alpaca dataset to enhance the model's generalization capabilities

High-Quality Audio Generation

Supports 100-200 step sampling, capable of generating high-quality audio effects

Model Capabilities

Text-to-Audio Conversion

Batch Audio Generation

Scene Sound Effect Synthesis

Use Cases

Multimedia Production

Sound Effect Generation

Automatically generates specific scene sound effects based on text descriptions

Can generate high-quality sound effects such as thunder, cheers, etc.

Background Music Synthesis

Generates matching background music based on scene descriptions

Game Development

Game Sound Effect Production

Quickly generates various sound effects required for game scenes

🚀 Tango 2: Aligning Diffusion-based Text-to-Audio Generative Models through Direct Preference Optimization

We developed Tango 2 based on Tango for text-to-audio generation. Tango 2 was initialized with the Tango-full-ft checkpoint and trained using Direct Preference Optimization (DPO) on audio-alpaca, a pairwise text-to-audio preference dataset. Tango-2-full was trained on an extended version of Audio-alpaca.

Read the paper

🚀 Quick Start

Download and Generate Audio

Download the Tango 2 model and generate audio from a text prompt:

import IPython
import soundfile as sf
from tango import Tango

tango = Tango("declare-lab/tango2-full")

prompt = "An audience cheering and clapping"
audio = tango.generate(prompt)
sf.write(f"{prompt}.wav", audio, samplerate=16000)
IPython.display.Audio(data=audio, rate=16000)

The model will be automatically downloaded and saved in cache. Subsequent runs will load the model directly from cache.

Adjusting Generation Steps

The generate function uses 100 steps by default to sample from the latent diffusion model. We recommend using 200 steps for generating better quality audios. This comes at the cost of increased run-time.

prompt = "Rolling thunder with lightning strikes"
audio = tango.generate(prompt, steps=200)
IPython.display.Audio(data=audio, rate=16000)

Batch Generation

Use the generate_for_batch function to generate multiple audio samples for a batch of text prompts:

prompts = [
    "A car engine revving",
    "A dog barks and rustles with some clicking",
    "Water flowing and trickling"
]
audios = tango.generate_for_batch(prompts, samples=2)

This will generate two samples for each of the three text prompts.

💻 Usage Examples

Basic Usage

import IPython
import soundfile as sf
from tango import Tango

tango = Tango("declare-lab/tango2-full")

prompt = "An audience cheering and clapping"
audio = tango.generate(prompt)
sf.write(f"{prompt}.wav", audio, samplerate=16000)
IPython.display.Audio(data=audio, rate=16000)

Advanced Usage

# Generate audio with 200 steps for better quality
prompt = "Rolling thunder with lightning strikes"
audio = tango.generate(prompt, steps=200)
IPython.display.Audio(data=audio, rate=16000)

# Generate multiple audio samples for a batch of text prompts
prompts = [
    "A car engine revving",
    "A dog barks and rustles with some clicking",
    "Water flowing and trickling"
]
audios = tango.generate_for_batch(prompts, samples=2)

📦 Installation

Our code is released here: https://github.com/declare-lab/tango

Please follow the instructions in the repository for installation, usage and experiments.

📄 License

This project is licensed under the CC BY-NC-SA 4.0 license.

📚 Documentation

Model Information

Property	Details
Pipeline Tag	text-to-audio
Tags	text-to-audio
Datasets	bjoernp/AudioCaps, declare-lab/audio-alpaca
Model Type	Tango 2, initialized with Tango-full-ft checkpoint and aligned using DPO on audio-alpaca
Training Data	Extended version of Audio-alpaca

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご