Open-source Discogs - Maest - 30s - pw - 129e model, freely achieve classification and analysis of 400 music styles!

Discogs Maest 30s Pw 129e

Developed by mtg-upf

MAEST is a series of Transformer models based on PASST, focusing on music analysis applications, capable of classifying 400 music genres

Audio Classification

Transformers

#Music Genre Classification #Transformer Architecture #Music Representation Learning

Downloads 1,002

Release Time : 9/27/2023

Model Overview

MAEST is a pre-trained music audio representation model for music genre classification tasks, performing well in multiple downstream music analysis tasks

Model Features

Efficient Music Representation Learning

Pre-trained on music genre classification tasks to learn efficient music audio representations

Multi-task Applicability

Representations extracted from intermediate layers perform excellently in various downstream music analysis tasks

Large-scale Genre Coverage

Supports classification of 400 music genres from Discogs

Model Capabilities

Music Genre Classification

Music Emotion Recognition

Instrument Detection

Music Audio Feature Extraction

Use Cases

Music Analysis

Music Genre Identification

Automatically identify the music genre of audio files

Performs well in 400-genre classification tasks

Music Emotion Analysis

Analyze emotional characteristics of music

Paper reports good performance in downstream tasks

Instrument Detection

Identify instruments used in music

Paper reports good performance in downstream tasks

🚀 Model Card for discogs-maest-30s-pw-129e

MAEST is a family of Transformer models based on PASST, specializing in music analysis. It offers pre - trained models for music style classification and performs well in various downstream music analysis tasks.

🚀 Quick Start

The MAEST models can be used with the audio-classification pipeline of the transformers library. Here is a basic example:

import numpy as np
from transformers import pipeline

# audio @16kHz
audio = np.random.randn(30 * 16000)

pipe = pipeline("audio-classification", model="mtg-upf/discogs-maest-30s-pw-129e")
pipe(audio)

[{'score': 0.6158794164657593, 'label': 'Electronic---Noise'},
 {'score': 0.08825448155403137, 'label': 'Electronic---Experimental'},
 {'score': 0.08772594481706619, 'label': 'Electronic---Abstract'},
 {'score': 0.03644488751888275, 'label': 'Rock---Noise'},
 {'score': 0.03272806480526924, 'label': 'Electronic---Musique Concrète'}]

✨ Features

Model Details

MAEST is a family of Transformer models based on PASST and focused on music analysis applications. The models are available for inference in the Essentia library and for inference and training in the official repository. You can try the MAEST interactive demo on replicate.

⚠️ Important Note

This model is available under CC BY - NC - SA 4.0 license for non - commercial applications and under proprietary license upon request. Contact us for more information.

⚠️ Important Note

The MAEST models rely on custom code. Set trust_remote_code=True to use them within the 🤗Transformers' audio - classification pipeline.

Model Description

Developed by: Pablo Alonso
Shared by: Pablo Alonso
Model type: Transformer
License: cc - by - nc - sa - 4.0
Finetuned from model: PaSST

Model Sources

Repository: MAEST
Paper: Efficient Supervised Training of Audio Transformers for Music Representation Learning

Uses

MAEST is a music audio representation model pre - trained on the task of music style classification. It shows good performance in several downstream music analysis tasks according to the original paper.

Direct Use

The MAEST models can make predictions for a taxonomy of 400 music styles derived from the public metadata of Discogs.

Downstream Use

The MAEST models have reported good performance in downstream applications related to music genre recognition, music emotion recognition, and instrument detection. Specifically, the original paper reports that the best performance is obtained from representations extracted from intermediate layers of the model.

Out - of - Scope Use

The model has not been evaluated outside the context of music understanding applications, so its performance outside the intended domain is unknown. Since it is for the audio - classification pipeline, MAEST is NOT a general - purpose audio classification model (such as [AST](https://huggingface.co/docs/transformers/model_doc/audio - spectrogram - transformer)), and it may not perform well in tasks like AudioSet.

Bias, Risks, and Limitations

The MAEST models were trained using Discogs20, an in - house MTG dataset. There is an over - representation of Western (particularly electronic) music, although efforts were made to maximize diversity regarding the 400 music styles in the dataset.

📦 Installation

No specific installation steps are provided in the original document.

📚 Documentation

Training Details

Training Data

Our models were trained using Discogs20, an MTG in - house dataset featuring 3.3M music tracks matched to Discogs' metadata.

Training Procedure

Most training details are detailed in the paper and official implementation of the model.

Preprocessing

MAEST models rely on mel - spectrograms originally extracted with the Essentia library, and used in several previous publications. In Transformers, this mel - spectrogram signature is replicated to a certain extent using audio_utils, which have a very small (but not neglectable) impact on the predictions.

Evaluation, Metrics, and results

The MAEST models were pre - trained in the task of music style classification, and their internal representations were evaluated via downstream MLP probes in several benchmark music understanding tasks. Check the original paper for details.

Environmental Impact

Hardware Type: 4 x Nvidia RTX 2080 Ti
Hours used: apprx. 32
Carbon Emitted: apprx. 3.46 kg CO2 eq.

Carbon emissions estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Technical Specifications

Model Architecture and Objective

[Audio Spectrogram Transformer (AST)](https://huggingface.co/docs/transformers/model_doc/audio - spectrogram - transformer)

Compute Infrastructure

Hardware: 4 x Nvidia RTX 2080 Ti
Software: Pytorch

Citation

BibTeX

@inproceedings{alonso2023music,
  title={Efficient supervised training of audio transformers for music representation learning},
  author={Alonso - Jim{\'e}nez, Pablo and Serra, Xavier and Bogdanov, Dmitry},
  booktitle={Proceedings of the 24th International Society for Music Information Retrieval Conference (ISMIR 2023)},
  year={2022},
  organization={International Society for Music Information Retrieval (ISMIR)}
}

APA

Alonso - Jiménez, P., Serra, X., & Bogdanov, D. (2023). Efficient Supervised Training of Audio Transformers for Music Representation Learning. In Proceedings of the 24th International Society for Music Information Retrieval Conference (ISMIR 2023)

Model Card Authors

Pablo Alonso

Model Card Contact

Twitter: @pablo__alonso
Github: @palonso
mail: pablo dot alonso at upf dot edu

📄 License

This model is available under CC BY - NC - SA 4.0 license for non - commercial applications and under proprietary license upon request.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご