M

MERT V1 95M

Developed by m-a-p
MERT-v1-330M is an advanced music understanding model trained based on the MLM paradigm, with 330M parameters, supporting a 24K Hz audio sampling rate and 75 Hz feature rate, suitable for various music information retrieval tasks.
Downloads 83.72k
Release Time : 3/17/2023

Model Overview

MERT-v1-330M is a music audio pre-training model trained using the MLM paradigm, offering stronger task generalization capabilities and higher audio sampling rates, suitable for tasks such as music classification and music generation.

Model Features

High Audio Sampling Rate
Supports a 24K Hz audio sampling rate, providing higher-quality audio processing capabilities.
Large-scale Training Data
Trained using 160K hours of music data, the model exhibits stronger generalization capabilities.
Multi-codebook Pseudo-labeling
Utilizes Encodec's 8-codebook pseudo-labeling to enhance quality and support music generation tasks.
In-batch Noise Mixing
Introduces MLM prediction with in-batch noise mixing to enhance model robustness.

Model Capabilities

Music Classification
Music Information Retrieval
Music Generation

Use Cases

Music Analysis
Music Genre Classification
Classifies music segments into genres such as pop, classical, jazz, etc.
Outperforms previous-generation models in multiple downstream tasks.
Music Emotion Recognition
Identifies emotional features in music, such as happiness, sadness, anger, etc.
Music Generation
Music Segment Generation
Generates new music segments based on input audio features.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase