Open-source discogs-maest-10s-pw-129e model - Free deployment for accurate music style classification

Discogs Maest 10s Pw 129e

Developed by mtg-upf

MAEST is a Transformer model family based on PASST, focusing on music analysis applications, particularly excelling in music genre classification tasks.

Audio Classification

Transformers

#Music genre classification #Transformer architecture #Mel spectrogram

Downloads 33

Release Time : 9/27/2023

Model Overview

MAEST is a pre-trained music audio representation model for music genre classification tasks, capable of predicting 400 music genres.

Model Features

Efficient music representation learning

Efficiently learns music audio representations based on supervised training methods

Broad music genre coverage

Supports classification of 400 music genres sourced from Discogs

Downstream task adaptability

Intermediate layer representations perform excellently in various music analysis tasks

Model Capabilities

Music genre classification

Music emotion recognition

Instrument detection

Music audio feature extraction

Use Cases

Music information retrieval

Automatic music genre tagging

Automatically add genre labels to music libraries

Supports 400 genre classifications

Music recommendation systems

Genre-based similar music recommendation

Find similar tracks using music genre features

🚀 Model Card for discogs-maest-10s-pw-129e

MAEST is a Transformer-based model family focused on music analysis. It can be used for music style classification and various downstream music analysis tasks, offering good performance according to the original paper.

🚀 Quick Start

The MAEST models can be used with the audio_classification pipeline of the transformers library. Here is an example:

import numpy as np
from transformers import pipeline

# audio @16kHz
audio = np.random.randn(30 * 16000)

pipe = pipeline("audio-classification", model="mtg-upf/discogs-maest-10s-pw-129e")
pipe(audio)

[{'score': 0.6158794164657593, 'label': 'Electronic---Noise'},
 {'score': 0.08825448155403137, 'label': 'Electronic---Experimental'},
 {'score': 0.08772594481706619, 'label': 'Electronic---Abstract'},
 {'score': 0.03644488751888275, 'label': 'Rock---Noise'},
 {'score': 0.03272806480526924, 'label': 'Electronic---Musique Concrète'}]

✨ Features

Music Analysis Focus: MAEST is designed for music analysis applications, especially music style classification.
Multiple Usage Scenarios: It can be used directly for music style prediction and also performs well in downstream tasks such as music genre recognition, emotion recognition, and instrument detection.
Available in Multiple Libraries: The models are available for inference in the Essentia library and for both inference and training in the official repository.

📦 Installation

Since the model is used with the transformers library, you can install it via the following command:

pip install transformers

💻 Usage Examples

Basic Usage

import numpy as np
from transformers import pipeline

# audio @16kHz
audio = np.random.randn(30 * 16000)

pipe = pipeline("audio-classification", model="mtg-upf/discogs-maest-10s-pw-129e")
pipe(audio)

Advanced Usage

# You can adjust the parameters of the pipeline for different requirements
# For example, change the batch size or top_k number of results
import numpy as np
from transformers import pipeline

audio = np.random.randn(30 * 16000)
pipe = pipeline("audio-classification", model="mtg-upf/discogs-maest-10s-pw-129e", batch_size=2, top_k=3)
results = pipe(audio)
print(results)

📚 Documentation

Model Details

MAEST is a family of Transformer models based on PASST and focused on music analysis applications. The MAEST models are also available for inference in the Essentia library and for inference and training in the official repository. You can try the MAEST interactive demo on replicate.

⚠️ Important Note

This model is available under CC BY-NC-SA 4.0 license for non-commercial applications and under proprietary license upon request. Contact us for more information.

⚠️ Important Note

The MAEST models rely on custom code. Set trust_remote_code=True to use them within the 🤗Transformers' audio-classification pipeline.

Property	Details
Developed by	Pablo Alonso
Shared by	Pablo Alonso
Model Type	Transformer
License	cc-by-nc-sa-4.0
Finetuned from model	PaSST

Model Sources

Repository: MAEST
Paper: Efficient Supervised Training of Audio Transformers for Music Representation Learning

Uses

MAEST is a music audio representation model pre-trained on the task of music style classification. According to the evaluation reported in the original paper, it reports good performance in several downstream music analysis tasks.

Direct Use

The MAEST models can make predictions for a taxonomy of 400 music styles derived from the public metadata of Discogs.

Downstream Use

The MAEST models have reported good performance in downstream applications related to music genre recognition, music emotion recognition, and instrument detection. Specifically, the original paper reports that the best performance is obtained from representations extracted from intermediate layers of the model.

Out-of-Scope Use

The model has not been evaluated outside the context of music understanding applications, so we are unaware of its performance outside its intended domain. Since the model is intended to be used within the audio-classification pipeline, it is important to mention that MAEST is NOT a general-purpose audio classification model (such as AST), so it should not be expected to perform well in tasks such as AudioSet.

Training Details

Training Data

Our models were trained using Discogs20, MTG in-house dataset featuring 3.3M music tracks matched to Discogs' metadata.

Training Procedure

Most training details are detailed in the paper and official implementation of the model.

Preprocessing

MAEST models rely on mel-spectrograms originally extracted with the Essentia library, and used in several previous publications. In Transformers, this mel-spectrogram signature is replicated to a certain extent using audio_utils, which have a very small (but not neglectable) impact on the predictions.

Evaluation, Metrics, and results

The MAEST models were pre-trained in the task of music style classification, and their internal representations were evaluated via downstream MLP probes in several benchmark music understanding tasks. Check the original paper for details.

Environmental Impact

Hardware Type: 4 x Nvidia RTX 2080 Ti
Hours used: apprx. 32
Carbon Emitted: apprx. 3.46 kg CO2 eq.

Carbon emissions estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Technical Specifications

Model Architecture and Objective

Audio Spectrogram Transformer (AST)

Compute Infrastructure

Local infrastructure
- Hardware: 4 x Nvidia RTX 2080 Ti
- Software: Pytorch

🔧 Technical Details

The MAEST models are based on the PASST architecture. They rely on mel - spectrograms for pre - processing, and the internal representations are evaluated via downstream MLP probes in music understanding tasks. The training details are mainly described in the paper and official implementation.

📄 License

This model is available under CC BY-NC-SA 4.0 license for non-commercial applications and under proprietary license upon request. Contact us for more information.

Citation

BibTeX:

@inproceedings{alonso2023music,
  title={Efficient supervised training of audio transformers for music representation learning},
  author={Alonso-Jim{\'e}nez, Pablo and Serra, Xavier and Bogdanov, Dmitry},
  booktitle={Proceedings of the 24th International Society for Music Information Retrieval Conference (ISMIR 2023)},
  year={2022},
  organization={International Society for Music Information Retrieval (ISMIR)}
}

APA:

Alonso-Jiménez, P., Serra, X., & Bogdanov, D. (2023). Efficient Supervised Training of Audio Transformers for Music Representation Learning. In Proceedings of the 24th International Society for Music Information Retrieval Conference (ISMIR 2023)

Model Card Authors

Pablo Alonso

Model Card Contact

Twitter: @pablo__alonso
Github: @palonso
mail: pablo dot alonso at upf dot edu

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご