Text-to-Music Open Source Music Generation Model - Generate ABC Notation Scores for Free Based on Text Descriptions

Text To Music

Developed by sander-wood

A text-conditioned symbolic music generation model based on BART-base architecture that can generate ABC notation scores from natural language descriptions

Text-to-Audio

Transformers

EnglishOpen Source License:MIT #Text-conditioned music generation #ABC notation output #Multi-style music composition

Downloads 405

Release Time : 11/21/2022

Model Overview

This model can directly generate complete and semantically coherent musical scores from text descriptions, supporting various music styles such as blues, classical, folk, etc.

Model Features

Real text-music pair training

The first text-conditioned symbolic music generation model trained on real text-music pairs

Multi-style support

Supports generating scores in various styles including blues, classical, folk, jazz, pop, and world music

End-to-end generation

Music generation is entirely done by the model without any manual rules

Model Capabilities

Text-to-music generation

ABC notation output

Multi-style music composition

Use Cases

Music composition

Traditional Irish dance music generation

Generate traditional Irish dance music from text descriptions

Generates a 6/8 time D major dance score matching the description

Personalized music creation

Generate personalized music works based on user-provided text descriptions

Generates original music semantically matching the description

🚀 Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music Generation Task

This project focuses on a language-music model that can generate sheet music from natural language descriptions. It offers a new approach to text-conditional symbolic music generation, trained on real text-music pairs without hand-crafted rules.

🚀 Quick Start

Model Initialization

You can initialize the model using the following code:

import torch
from samplings import top_p_sampling, temperature_sampling
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained('sander-wood/text-to-music')
model = AutoModelForSeq2SeqLM.from_pretrained('sander-wood/text-to-music')

Generation Process

Here is the process to generate music from text:

max_length = 1024
top_p = 0.9
temperature = 1.0

text = "This is a traditional Irish dance music."
input_ids = tokenizer(text, 
                      return_tensors='pt', 
                      truncation=True, 
                      max_length=max_length)['input_ids']

decoder_start_token_id = model.config.decoder_start_token_id
eos_token_id = model.config.eos_token_id

decoder_input_ids = torch.tensor([[decoder_start_token_id]])

for t_idx in range(max_length):
    outputs = model(input_ids=input_ids, 
    decoder_input_ids=decoder_input_ids)
    probs = outputs.logits[0][-1]
    probs = torch.nn.Softmax(dim=-1)(probs).detach().numpy()
    sampled_id = temperature_sampling(probs=top_p_sampling(probs, 
                                                           top_p=top_p, 
                                                           return_probs=True),
                                      temperature=temperature)
    decoder_input_ids = torch.cat((decoder_input_ids, torch.tensor([[sampled_id]])), 1)
    if sampled_id!=eos_token_id:
        continue
    else:
        tune = "X:1\n"
        tune += tokenizer.decode(decoder_input_ids[0], skip_special_tokens=True)
        print(tune)
        break

✨ Features

Text-Conditional Generation: Generate complete and semantically consistent sheet music directly from natural language descriptions.
Diverse Styles: The generated music covers a variety of styles, including blues, classical, folk, jazz, pop, and world music.
Online Experience: Available for online use and experience on Textune: Generating Tune from Text.

📦 Installation

The model can be installed using the transformers library:

pip install transformers

💻 Usage Examples

Basic Usage

Here is a basic example of generating music from text:

# The code above for generation process

Advanced Usage

You can adjust the parameters such as top_p, max_length, and temperature to get different generation results.

Generation Examples

Here are some examples generated by this model without cherry-picking:

######################## INPUT TEXT ########################

This is a traditional Irish dance music.
Note Length-1/8
Meter-6/8
Key-D

####################### OUTPUT TUNES #######################

X:1
L:1/8
M:6/8
K:D
 A | BEE BEE | Bdf edB | BAF FEF | DFA BAF | BEE BEE | Bdf edB | BAF DAF | FED E2 :: A |
 Bef gfe | faf edB | BAF FEF | DFA BAF | Bef gfe | faf edB | BAF DAF | FED E2 :|

X:2
L:1/8
M:6/8
K:D
 A |: DED F2 A | d2 f ecA | G2 B F2 A | E2 F GFE | DED F2 A | d2 f ecA | Bgf edc |1 d3 d2 A :|2
 d3 d2 a || a2 f d2 e | f2 g agf | g2 e c2 d | e2 f gfe | fed gfe | agf bag | fed cde | d3 d2 a |
 agf fed | Adf agf | gfe ecA | Ace gfe | fed gfe | agf bag | fed cde | d3 d2 ||

X:3
L:1/8
M:6/8
K:D
 BEE BEE | Bdf edB | BAF FEF | DFA dBA | BEE BEE | Bdf edB | BAF FEF |1 DED DFA :|2 DED D2 e |:
 faf edB | BAF DFA | BAF FEF | DFA dBA | faf edB | BAF DFA | BdB AFA |1 DED D2 e :|2 DED DFA ||

######################## INPUT TEXT ########################

This is a jazz-swing lead sheet with chord and vocal.

####################### OUTPUT TUNES #######################

X:1
L:1/8
M:4/4
K:F
"F" CFG |"F" A6 z G |"Fm7" A3 G"Bb7" A3 G |"F" A6 z G |"F7" A4"Eb7" G4 |"F" F6 z F |
"Dm" A3 G"Dm/C" A3 G |"Bb" A2"Gm" B2"C7" G3 G |"F" F8- |"Dm7""G7" F6 z2 |"C" C4 C3 C |
"C7" C2 B,2"F" C4 |"F" C4 C3 C |"Dm" D2 C2"Dm/C" D4 |"Bb" D4 D3 D |"Bb" D2 C2"C7" D4 |"F" C8- |
"F" C4"Gm" z C"C7" FG |"F" A6 z G |"Fm7" A3 G"Bb7" A3 G |"F" A6 z G |"F7" A4"Eb7" G4 |"F" F6 z F |
"Dm" A3 G"Dm/C" A3 G |"Bb" A2"Gm" B2"C7" G3 G |"F" F8- |"F" F6 z2 |]

X:2
L:1/4
M:4/4
K:F
"^A""F" A3 A |"Am7" A2"D7" A2 |"Gm7" G2"C7" G A |"F" F4 |"F" A3 A |"Am7" A2"D7" A2 |"Gm7" G2"C7" G A |
"F" F4 |"Gm" B3 B |"Am7" B2"D7" B2 |"Gm" B2"D7" B A |"Gm7" G4 |"F" A3 A |"Am7" A2"D7" A2 |
"Gm7" G2"C7" G A |"F" F4 |"Bb7" F3 G |"F" A2 A2 |"Gm" B2"C7" B2 |"F" c2"D7" c c |"Gm7" c2"C7" B2 |
"F" A2"F7" A2 |"Bb" B2"F" B A |"Bb" B2"F" B A |"Gm" B2"F" B A |"Gm7" B2"F" B A |"Gm7" B2"F" B A |
"C7" B2 c2 |"F""Bb7" A4 |"F""Bb7" z4 |]

X:3
L:1/4
M:4/4
K:Bb
 B, ||"Gm""^A1" G,2 B, D |"D7" ^F A2 G/=F/ |"Gm" G2"Cm7" B c |"F7" A2 G =F |"Bb" D2 F A |
"Cm7" c e2 d/c/ |"Gm7" B3/2 G/-"C7" G2- |"F7" G2 z B, |"Gm""^B" G,2 B, D |"D7" ^F A2 G/=F/ |
"Gm" G2"Cm7" B c |"F7" A2 G =F |"Bb" D2 F A |"Cm7" c e2 d/c/ |"Gm7" B3/2 G/-"C7" G2- |"F7" G2 z2 ||
"^C""F7""^A2" F4- | F E D C |"Bb" D2 F B | d3 c/B/ |"F" A2"Cm7" G2 |"D7" ^F2 G2 |"Gm" B3"C7" A |
"F7" G4 ||"F7""^A3" F4- | F E D C |"Bb" D2 F B | d3 c/B/ |"F" A2"Cm7" G2 |"D7" ^F2 G2 |"Gm" B3 A |
"C7" G4 ||"^B""Gm""^C" B2 c B |"Cm" c B c B |"Gm7" c2 B A |"C7" B3 A |"Bb" B2 c B |"G7" d c B A |
"Cm" G2 A G |"F7" F2 z G ||"^C""F7" F F3 |"Bb" D D3 |"Cm" E E3 |"D7" ^F F3 |"Gm" G2 A B |"C7" d3 d |
"Gm" d3 d |"D7" d3 B, ||"^D""Gm" G,2 B, D |"D7" ^F A2 G/=F/ |"Gm" G2"Cm7" B c |"F7" A2 G =F |
"Bb" D2 F A |"Cm7" c e2 d/c/ |"Gm7" B3/2 G/-"C7" G2- |"F7" G2 z2 |]

######################## INPUT TEXT ########################

This is a Chinese folk song from the Jiangnan region. It was created during the Qianlong era (1735-1796) of the Qing dynasty. Over time, many regional variations were created, and the song gained popularity both in China and abroad. One version of the song describes a custom of giving jasmine flowers, popular in the southern Yangtze delta region of China.

####################### OUTPUT TUNES #######################

X:1
L:1/8
Q:1/4=100
M:2/4
K:C
"^Slow" DA A2 | GA c2- | c2 G2 | c2 GF | GA/G/ F2 | E2 DC | DA A2 | GA c2- | c2 GA | cd- d2 |
 cA c2- | c2 GA | cd- d2 | cA c2- | c2 GA | c2 A2 | c2 d2 | cA c2- | c2 c2 | A2 G2 | F2 AG | F2 ED |
 CA,/C/ D2- | D2 CD | F2 A2 | G2 ED | CG A2 | G2 FD | CA,/C/ D2- | D2 CD | F2 A2 | G2 ED |
 CG A2 | G2 FD | CA,/C/ D2- | D2 z2 :|

X:2
L:1/8
Q:1/4=100
M:2/4
K:C
"^ MDolce" Ac de | d2 AG | cA cd | A2 AG | E2 ED | CD E2- | E2 z2 | EG ed | c2 AG | cA cd |
 A2 AG | E2 ED | CD E2- | E2 z2 |"^ howeveroda" Ac de | d2 AG | cA cd | A2 AG | E2 ED | CD E2- |
 E2 z2 | A2 cA | GA E2- | E2 z2 | GA cd | e2 ed | cd e2- | e2 z2 | ge d2 | cd c2- | c2 z2 |
 Ac de | d2 AG | cA cd | A2 AG | E2 ED | CD E2- | E2 z2 | EG ed | c2 AG | cA cd | A2 AG | E2 ED |
 CD E2- | E2 z2 |"^DDtisata" Ac de | d2 AG | cA cd | A2 AG | E2 ED | CD E2- | E2 z2 | A2 cA |
 GA E2- | E2 z2 | GA cd | e2 ed | cd e2- | e2 z2 | ge d2 | cd c2- | c2 z2 | Ac de | d2 AG |
 cA cd | A2 AG | E2 ED | CD E2- | E2 z2 | Ac de | d2 AG | cA cd | A2 AG | E2 ED | CD E2- | E2 z2 |
 Ac de | d2 AG | cA cd | A2 AG | E2 ED | CD E2- | E2 z2 |"^  Easy" Ac de | d2 AG | cA cd |
 A2 AG | E2 ED | CD E2- | E2 z2 | Ac de | d2 AG | cA cd | A2 AG | E2 ED | CD E2- | E2 z2 |]

X:3
L:1/8
Q:1/4=60
M:4/4
K:C
"^S books defe.." AA A2 cdcc | AcAG A4- | A8 | A,4 CD C2 | A,4 cdcA | A2 GA- A4- | A2 GA A2 AA |
 AG E2 D2 C2 | D6 ED | C2 D4 C2 | D2 C2 D4 | C2 A,2 CD C2 | A,4 cdcA | A2 GA- A4- | A2 GA A2 AA |
 AG E2 D2 C2 | D6 z2 |]

📚 Documentation

Model description

This language-music model takes BART-base fine-tunes on 282,870 English text-music pairs, where all scores are represented in ABC notation. It was introduced in the paper Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music Generation Task by Wu et al. and released in this repository.

It is capable of generating complete and semantically consistent sheet music directly from descriptions in natural language based on text. To the best of our knowledge, this is the first model that achieves text-conditional symbolic music generation which is trained on real text-music pairs, and the music is generated entirely by the model and without any hand-crafted rules.

This language-music model is available for online use and experience on Textune: Generating Tune from Text. With this online platform, you can easily input your desired text descriptions and receive a generated sheet music output from the model.

Due to copyright reasons, we are unable to publicly release the training dataset of this model. Instead, we have made available the WikiMusicText (WikiMT) dataset, which includes 1010 pairs of text-music data and can be used to evaluate the performance of language-music models.

Intended uses & limitations

You can use this model for text-conditional music generation. All scores generated by this model can be written on one stave (for vocal solo or instrumental solo) in standard classical notation, and are in a variety of styles, e.g., blues, classical, folk, jazz, pop, and world music. We recommend using the script in this repository for inference. The generated tunes are in ABC notation, and can be converted to sheet music or audio using this website, or this software.

Its creativity is limited, can not perform well on tasks requiring a high degree of creativity (e.g., melody style transfer), and it is input-sensitive. For more information, please check our paper.

🔧 Technical Details

The model is based on the fine-tuning of BART-base on a large number of English text-music pairs. The training process uses these pairs to learn the mapping from text descriptions to music scores in ABC notation.

📄 License

This project is licensed under the MIT license.

BibTeX entry and citation info

@inproceedings{
wu2023exploring,
title={Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music Generation Task}, 
author={Shangda Wu and Maosong Sun},
booktitle={The AAAI-23 Workshop on Creative AI Across Modalities},
year={2023},
url={https://openreview.net/forum?id=QmWXskBhesn}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご