Discogs - Maest - 20s - PW - 129e Open - Source Music Analysis Model - Free to Achieve Precise Music Genre Classification

Discogs Maest 20s Pw 129e

Developed by mtg-upf

MAEST is a series of Transformer models based on PASST, focusing on music analysis applications, particularly music genre classification tasks.

Audio Classification

Transformers

#Music genre classification #Transformer audio analysis #Discogs metadata

Downloads 28

Release Time : 9/27/2023

Model Overview

MAEST is a Transformer-based music audio representation model primarily used for music genre classification tasks and performs well in various downstream music analysis tasks.

Model Features

Efficient music representation learning

Pre-trained on music genre classification tasks to learn efficient music audio representations.

Multi-task downstream applications

Excels in downstream applications such as music genre recognition, music emotion recognition, and instrument detection.

Intermediate layer representation extraction

Extracting representations from intermediate layers of the model yields optimal performance.

Model Capabilities

Music genre classification

Music genre recognition

Music emotion recognition

Instrument detection

Use Cases

Music analysis

Music genre classification

Classify and predict 400 music genres derived from Discogs public metadata.

Performs well in various downstream music analysis tasks.

Music emotion recognition

Identify emotional characteristics of music.

The original paper reports excellent performance.

🚀 Model Card for discogs-maest-20s-pw-129e

MAEST is a Transformer-based model family for music analysis. It's pre - trained for music style classification and performs well in various downstream music analysis tasks.

🚀 Quick Start

The MAEST models can be used with the audio_classification pipeline of the transformers library. Here is an example:

import numpy as np
from transformers import pipeline

# audio @16kHz
audio = np.random.randn(30 * 16000)

pipe = pipeline("audio-classification", model="mtg-upf/discogs-maest-20s-pw-129e")
pipe(audio)

[{'score': 0.6158794164657593, 'label': 'Electronic---Noise'},
 {'score': 0.08825448155403137, 'label': 'Electronic---Experimental'},
 {'score': 0.08772594481706619, 'label': 'Electronic---Abstract'},
 {'score': 0.03644488751888275, 'label': 'Rock---Noise'},
 {'score': 0.03272806480526924, 'label': 'Electronic---Musique Concrète'}]

⚠️ Important Note

This model is available under CC BY - NC - SA 4.0 license for non - commercial applications and under proprietary license upon request. Contact us for more information.

⚠️ Important Note

The MAEST models rely on custom code. Set trust_remote_code=True to use them within the 🤗Transformers' audio - classification pipeline.

✨ Features

Music Representation: Pre - trained on music style classification, it can generate effective music representations.
Downstream Performance: Performs well in multiple downstream music analysis tasks such as genre recognition, emotion recognition, and instrument detection.

📚 Documentation

Model Details

Model Description

Developed by: Pablo Alonso
Shared by: Pablo Alonso
Model type: Transformer
License: cc - by - nc - sa - 4.0
Finetuned from model: PaSST

Model Sources

Repository: MAEST
Paper: Efficient Supervised Training of Audio Transformers for Music Representation Learning

Uses

Direct Use

The MAEST models can make predictions for a taxonomy of 400 music styles derived from the public metadata of Discogs.

Downstream Use

The MAEST models have reported good performance in downstream applications related to music genre recognition, music emotion recognition, and instrument detection. Specifically, the original paper reports that the best performance is obtained from representations extracted from intermediate layers of the model.

Out - of - Scope Use

The model has not been evaluated outside the context of music understanding applications, so we are unaware of its performance outside its intended domain. Since the model is intended to be used within the audio - classification pipeline, it is important to mention that MAEST is NOT a general - purpose audio classification model (such as [AST](https://huggingface.co/docs/transformers/model_doc/audio - spectrogram - transformer)), so it should not be expected to perform well in tasks such as AudioSet.

Bias, Risks, and Limitations

The MAEST models were trained using Discogs20, an in - house MTG dataset derived from the public Discogs metadata. While we tried to maximize the diversity with respect to the 400 music styles covered in the dataset, we noted an overrepresentation of Western (particularly electronic) music.

Training Details

Training Data

Our models were trained using Discogs20, MTG in - house dataset featuring 3.3M music tracks matched to Discogs' metadata.

Training Procedure

Most training details are detailed in the paper and official implementation of the model.

Preprocessing

MAEST models rely on mel - spectrograms originally extracted with the Essentia library, and used in several previous publications. In Transformers, this mel - spectrogram signature is replicated to a certain extent using audio_utils, which have a very small (but not neglectable) impact on the predictions.

Evaluation, Metrics, and Results

The MAEST models were pre - trained in the task of music style classification, and their internal representations were evaluated via downstream MLP probes in several benchmark music understanding tasks. Check the original paper for details.

Environmental Impact

Hardware Type: 4 x Nvidia RTX 2080 Ti
Hours used: apprx. 32
Carbon Emitted: apprx. 3.46 kg CO2 eq.

Carbon emissions estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Technical Specifications

Model Architecture and Objective

[Audio Spectrogram Transformer (AST)](https://huggingface.co/docs/transformers/model_doc/audio - spectrogram - transformer)

Compute Infrastructure

Hardware: 4 x Nvidia RTX 2080 Ti
Software: Pytorch

Citation

BibTeX:

@inproceedings{alonso2023music,
  title={Efficient supervised training of audio transformers for music representation learning},
  author={Alonso - Jim{\'e}nez, Pablo and Serra, Xavier and Bogdanov, Dmitry},
  booktitle={Proceedings of the 24th International Society for Music Information Retrieval Conference (ISMIR 2023)},
  year={2022},
  organization={International Society for Music Information Retrieval (ISMIR)}
}

APA:

Alonso - Jiménez, P., Serra, X., & Bogdanov, D. (2023). Efficient Supervised Training of Audio Transformers for Music Representation Learning. In Proceedings of the 24th International Society for Music Information Retrieval Conference (ISMIR 2023)

Model Card Authors

Pablo Alonso

Model Card Contact

Twitter: @pablo__alonso
Github: @palonso
mail: pablo dot alonso at upf dot edu

📄 License

This model is available under CC BY - NC - SA 4.0 license for non - commercial applications and under proprietary license upon request.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご