M

MERT V1 330M

Developed by m-a-p
MERT-v1-330M is an advanced music understanding model trained based on the MLM paradigm, with a parameter scale of 330M, supporting 24K Hz audio sample rate, and suitable for various music information retrieval tasks.
Downloads 16.92k
Release Time : 3/17/2023

Model Overview

This model adopts the masked language modeling (MLM) pre-training paradigm, trained on a large-scale music dataset (160,000 hours), and possesses excellent music feature extraction and understanding capabilities, suitable for downstream tasks such as music classification and music generation.

Model Features

Large-Scale Pre-training
Trained using 160,000 hours of music data, covering a wide range of music styles and genres
High Audio Quality Processing
Supports 24K Hz high sample rate audio input, capable of capturing richer musical details
Improved MLM Paradigm
Utilizes EnCodec's 8-codebook pseudo labels and intra-batch noise mixing techniques to enhance pre-training effectiveness
Multi-Task Generalization Capability
Demonstrates excellent generalization performance in downstream music understanding tasks

Model Capabilities

Music Feature Extraction
Music Genre Classification
Music Emotion Recognition
Music Generation Support

Use Cases

Music Recommendation Systems
Music Genre Classification
Automatically identifies and classifies the stylistic features of music pieces
Can be used for front-end processing in personalized music recommendation systems
Music Content Analysis
Music Emotion Analysis
Analyzes the emotional characteristics expressed in music pieces
Suitable for application scenarios such as music therapy and emotion recognition
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase