Musicgen - songstarter - v0.2 Open - source Text - to - Audio Model: Freely Generate Song Ideas for Music Producers

Musicgen Songstarter V0.2

Developed by nateraw

A text-to-audio model fine-tuned from musicgen-stereo-melody-large, designed for music producers to generate 32kHz stereo audio song ideas

Audio Generation English#Music Creative Generation #Stereo Audio #Melodic Loops

Downloads 22.11k

Release Time : 4/12/2024

Model Overview

This model is fine-tuned from Facebook's musicgen-stereo-melody-large using melodic loop samples from the Splice sample library, aiming to generate practical song ideas for music producers.

Model Features

High-Quality Music Generation

Generates 32kHz high-fidelity stereo audio suitable for professional music production

Song Inspiration

Specifically designed for music producers to generate practical song fragments and ideas

Melody Guidance Support

Can generate matching musical content based on input melody audio

Improved Training Data

Compared to v0.1, the training data volume has tripled, and the model size has doubled

Model Capabilities

Text description to music generation

Unconditional music generation

Melody-based music generation

Multi-style music creation

Use Cases

Music Production

Song Idea Generation

Provides creative inspiration and starting materials for music producers

Generates audio clips ready for immediate use in music production

Melody Expansion

Generates complete musical arrangements based on simple melody inputs

Develops simple melodies into rich musical works

Content Creation

Background Music Generation

Creates custom background music for videos, podcasts, and other content

Quickly generates background music that matches the content atmosphere

🚀 Model Card for musicgen-songstarter-v0.2

This model, fine - tuned from musicgen - stereo - melody - large, generates stereo audio at 32khz and is designed to offer useful song ideas for music producers.

musicgen - songstarter - v0.2 is a musicgen - stereo - melody - large fine - tuned on a dataset of melody loops from my Splice sample library. It's intended to be used to generate song ideas that are useful for music producers. It generates stereo audio in 32khz.

👀 Update: I wrote a blogpost detailing how and why I trained this model, including training details, the dataset, Weights and Biases logs, etc.

Compared to musicgen - songstarter - v0.1, this new version:

was trained on 3x more unique, manually - curated samples that I painstakingly purchased on Splice
Is twice the size, bumped up from size medium ➡️ large transformer LM

If you find this model interesting, please consider:

following me on GitHub
following me on Twitter

🚀 Quick Start

📦 Installation

Install audiocraft:

pip install -U git+https://github.com/facebookresearch/audiocraft#egg=audiocraft

💻 Usage Examples

Basic Usage

Then, you should be able to load this model just like any other musicgen checkpoint here on the Hub:

import torchaudio
from audiocraft.models import MusicGen
from audiocraft.data.audio import audio_write

model = MusicGen.get_pretrained('nateraw/musicgen-songstarter-v0.2')
model.set_generation_params(duration=8)  # generate 8 seconds.
wav = model.generate_unconditional(4)    # generates 4 unconditional audio samples
descriptions = ['acoustic, guitar, melody, trap, d minor, 90 bpm'] * 3
wav = model.generate(descriptions)  # generates 3 samples.

melody, sr = torchaudio.load('./assets/bach.mp3')
# generates using the melody from the given audio and the provided descriptions.
wav = model.generate_with_chroma(descriptions, melody[None].expand(3, -1, -1), sr)

for idx, one_wav in enumerate(wav):
    # Will save under {idx}.wav, with loudness normalization at -14 db LUFS.
    audio_write(f'{idx}', one_wav.cpu(), model.sample_rate, strategy="loudness", loudness_compressor=True)

Prompt Format

Follow the following prompt format:

{tag_1}, {tag_2}, ..., {tag_n}, {key}, {bpm} bpm

For example:

hip hop, soul, piano, chords, jazz, neo jazz, G# minor, 140 bpm

For some example tags, [see the prompt format section of musicgen - songstarter - v0.1's readme](https://huggingface.co/nateraw/musicgen-songstarter-v0.1#prompt - format). The tags there are for the smaller v1 dataset, but should give you an idea of what the model saw.

Samples

Audio Prompt	Text Prompt	Output
	trap, synthesizer, songstarters, dark, G# minor, 140 bpm
	acoustic, guitar, melody, trap, D minor, 90 bpm

🔧 Technical Details

General Information

For more verbose details, you can check out the blogpost.

Training Information

Property	Details
Code	Repo is here. It's an undocumented fork of facebookresearch/audiocraft where I rewrote the training loop with PyTorch Lightning, which worked a bit better for me.
Training Data	Around 1700 - 1800 samples I manually listened to + purchased via my personal Splice account. About 7 - 8 hours of audio. Given the licensing terms, I cannot share the data.
Hardware	8xA100 40GB instance from Lambda Labs
Procedure	Trained for 10k steps, which took about 6 hours. Reduced segment duration at train time to 15 seconds.
Hyperparameters/Logs	See the wandb run, which includes training metrics, logs, hardware metrics at train time, hyperparameters, and the exact command I used when I ran the training script.

📄 License

The model is under the cc - by - nc - 4.0 license.

Acknowledgements

This work would not have been possible without:

Lambda Labs, for subsidizing larger training runs by providing some compute credits
Replicate, for early development compute resources

Thank you ❤️

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご