mert-base开源声学音乐理解模型 - 免费助力音乐理解与分析应用

Home

Mert Base

Developed by yangwang825

MERT是一种基于自监督学习的声学音乐理解模型，通过教师模型提供伪标签进行预训练。

音频分类

Transformers

#声学音乐理解 #自监督预训练 #多采样率支持

Downloads 26

Release Time : 8/6/2023

Model Overview

MERT模型专注于音频分类任务，特别适用于音乐理解领域。它通过引入教师模型在掩码语言建模(MLM)风格的声学预训练中提供伪标签，从而提升模型性能。

Model Features

自监督预训练

采用大规模自监督训练方法，无需大量标注数据即可学习有效特征

教师模型指导

在预训练过程中引入教师模型提供伪标签，提升训练效果

多采样率支持

能够处理不同采样率的音频输入(16kHz-44.1kHz)

Model Capabilities

音频特征提取

音乐分类

声学信号处理

Use Cases

音乐分析

音乐流派分类

对音乐片段进行流派自动分类

音乐情感识别

识别音乐表达的情感类型

音频处理

音频特征提取

提取音频的高级特征表示

🚀 MERT

MERT（基于大规模自监督训练的声学音乐理解模型）在掩码语言建模（MLM）风格的声学预训练中引入了教师模型来提供伪标签。

MERT的预训练权重来自 m-a-p/MERT-v1-95M。在本仓库中，我们为 AutoModelForAudioClassification 自动类注册了MERT。

🚀 快速开始

模型信息

属性	详情
模型类型	音频分类模型
预训练权重来源	m-a-p/MERT-v1-95M

代码使用说明

依赖库

运行代码需要安装 transformers 和 numpy 库，你可以使用以下命令进行安装：

pip install transformers numpy

代码示例

import numpy as np
from transformers import AutoFeatureExtractor, AutoModelForAudioClassification

# Some configurations
model_id = 'yangwang825/mert-base'
batch_size = 4
num_classes = 10
max_duration = 1.0

# Initialise the extractor and model
feature_extractor = AutoFeatureExtractor.from_pretrained(
    model_id, 
    trust_remote_code=True
)
mert = AutoModelForAudioClassification.from_pretrained(
    model_id,
    num_labels=num_classes,
    ignore_mismatched_sizes=True,
    trust_remote_code=True
)

# Simulate a list of waveforms (e.g. four audio clips)
audio_arrays = [
    np.random.rand(16000, ),
    np.random.rand(24000, ),
    np.random.rand(22050, ),
    np.random.rand(44100, )
]
inputs = feature_extractor(
    audio_arrays, # List of waveforms in numpy array format
    sampling_rate=feature_extractor.sampling_rate, 
    max_length=int(feature_extractor.sampling_rate * max_duration), 
    padding='max_length', 
    truncation=True, 
    return_tensors='pt'
)
# The shape of `input_values` is (batch_size, sample_rate * max_duration)
input_values = inputs['input_values']
outputs = mert(**inputs)
# The shape of `logits` is (batch_size, num_classes)
logits = outputs['logits']