my_awesome_mind_model開源音頻分類模型 - 精準識別音頻類別，免費部署超實用

首頁

My Awesome Mind Model

由Gyaneshere開發

基於facebook/wav2vec2-base模型在minds14數據集上微調的音頻分類模型

音頻分類

Transformers

開源協議:Apache-2.0 #意圖識別 #語音分類 #Wav2Vec2微調

下載量 4

發布時間 : 2/7/2025

模型概述

這是一個用於音頻分類的微調模型，主要用於識別說話者意圖。模型基於wav2vec2架構，在minds14數據集上進行了微調。

模型特點

基於wav2vec2架構

使用facebook開源的wav2vec2-base模型作為基礎架構

輕量級微調

在minds14數據集上進行了10個epoch的微調

模型能力

音頻分類

說話者意圖識別

使用案例

語音交互

語音助手意圖識別

識別用戶通過語音表達的意圖

🚀 超棒的MIND模型

本模型是 facebook/wav2vec2-base 在 minds14 數據集上的微調版本。它在評估集上取得了如下結果：

損失值：2.6577
準確率：0.0619

🚀 快速開始

本指南將展示如何：

在 MInDS - 14 數據集上微調 Wav2Vec2 以對說話者意圖進行分類。
使用微調後的模型進行推理。

本教程中演示的任務支持以下模型架構： Audio Spectrogram Transformer、Data2VecAudio、Hubert、SEW、SEW - D、UniSpeech、UniSpeechSat、Wav2Vec2、Wav2Vec2 - Conformer、WavLM、Whisper

在開始之前，請確保已安裝所有必要的庫：

pip install transformers datasets evaluate

建議登錄 Hugging Face 賬戶，以便將模型上傳並與社區共享。出現提示時，輸入令牌進行登錄：

from huggingface_hub import notebook_login

notebook_login()

✨ 主要特性

基於預訓練模型 facebook/wav2vec2-base 進行微調，用於音頻分類任務。
支持對說話者意圖進行分類，在 MInDS - 14 數據集上進行了訓練和評估。

📦 安裝指南

安裝必要庫

pip install transformers datasets evaluate

安裝指定版本（可選）

如果需要從源代碼安裝而不是使用最新版本，可以使用以下命令：

pip install git+https://github.com/huggingface/transformers.git

💻 使用示例

基礎用法

加載數據集

from datasets import load_dataset, Audio

minds = load_dataset("PolyAI/minds14", name="en-US", split="train")
minds = minds.train_test_split(test_size=0.2)
minds = minds.remove_columns(["path", "transcription", "english_transcription", "lang_id"])

預處理數據

from transformers import AutoFeatureExtractor

feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/wav2vec2-base")

def preprocess_function(examples):
    audio_arrays = [x["array"] for x in examples["audio"]]
    inputs = feature_extractor(
        audio_arrays, sampling_rate=feature_extractor.sampling_rate, max_length=16000, truncation=True
    )
    return inputs

encoded_minds = minds.map(preprocess_function, remove_columns="audio", batched=True)
encoded_minds = encoded_minds.rename_column("intent_class", "label")

評估指標

import evaluate

accuracy = evaluate.load("accuracy")

import numpy as np

def compute_metrics(eval_pred):
    predictions = np.argmax(eval_pred.predictions, axis=1)
    return accuracy.compute(predictions=predictions, references=eval_pred.label_ids)

訓練模型

from transformers import AutoModelForAudioClassification, TrainingArguments, Trainer

num_labels = len(id2label)
model = AutoModelForAudioClassification.from_pretrained(
    "facebook/wav2vec2-base", num_labels=num_labels, label2id=label2id, id2label=id2label
)

training_args = TrainingArguments(
    output_dir="my_awesome_mind_model",
    evaluation_strategy="epoch",
    save_strategy="epoch",
    learning_rate=3e-5,
    per_device_train_batch_size=32,
    gradient_accumulation_steps=4,
    per_device_eval_batch_size=32,
    num_train_epochs=10,
    warmup_ratio=0.1,
    logging_steps=10,
    load_best_model_at_end=True,
    metric_for_best_model="accuracy",
    push_to_hub=True,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=encoded_minds["train"],
    eval_dataset=encoded_minds["test"],
    tokenizer=feature_extractor,
    compute_metrics=compute_metrics,
)

trainer.train()
trainer.push_to_hub()

高級用法

推理

# 加載音頻文件進行推理，記得根據需要重採樣音頻文件的採樣率以匹配模型的採樣率
from transformers import pipeline

# 實例化音頻分類管道
classifier = pipeline("audio-classification", model="your_model_name_on_huggingface")
audio_file = "your_audio_file.wav"
result = classifier(audio_file)
print(result)

📚 詳細文檔

模型描述

用於微調的基礎模型：facebook/wav2vec2-base

預期用途與限制

可用於熟悉預訓練模型的微調過程，但並非適用於生產環境的模型。

訓練和評估數據

訓練數據集鏈接：https://huggingface.co/datasets/PolyAI/minds14 你也可以使用自己的數據，並對其進行預處理以用於訓練。

訓練過程

訓練超參數

訓練期間使用了以下超參數：

屬性	詳情
學習率	3e - 05
訓練批次大小	32
評估批次大小	32
隨機種子	42
梯度累積步數	4
總訓練批次大小	128
優化器	使用 OptimizerNames.ADAMW_TORCH，其中 betas=(0.9, 0.999)，epsilon = 1e - 08，無額外優化器參數
學習率調度器類型	線性
學習率調度器熱身比例	0.1
訓練輪數	10

訓練結果

訓練損失	輪數	步數	驗證損失	準確率
無日誌記錄	0.8	3	2.6463	0.0619
無日誌記錄	1.8	6	2.6525	0.0442
無日誌記錄	2.8	9	2.6524	0.0619
3.0286	3.8	12	2.6569	0.0619
3.0286	4.8	15	2.6572	0.0531
3.0286	5.8	18	2.6546	0.0619
3.0109	6.8	21	2.6593	0.0708
3.0109	7.8	24	2.6585	0.0531
3.0109	8.8	27	2.6569	0.0619
3.0047	9.8	30	2.6577	0.0619

框架版本

Transformers 4.48.2
Pytorch 2.5.1+cu124
Datasets 3.2.0
Tokenizers 0.21.0

🔧 技術細節

在訓練過程中，使用了預訓練的 facebook/wav2vec2-base 模型，並在 MInDS - 14 數據集上進行微調。通過設置合適的超參數，如學習率、批次大小等，使用 Trainer 類進行訓練和評估。在預處理階段，使用 AutoFeatureExtractor 對音頻數據進行處理，確保輸入數據的格式符合模型要求。同時，使用 evaluate 庫加載準確率指標，在訓練過程中對模型性能進行評估。