my_awesome_mind_model Open-source Audio Classification Model - Accurately Identify Audio Categories and Free Deployment, Extremely Practical

Home

My Awesome Mind Model

Developed by Gyaneshere

An audio classification model fine-tuned on the minds14 dataset based on the facebook/wav2vec2-base model

Audio Classification

Transformers

Open Source License:Apache-2.0 #Intent recognition #Voice classification #Wav2Vec2 fine-tuning

Downloads 4

Release Time : 2/7/2025

Model Overview

This is a fine-tuned model for audio classification, mainly used to identify the speaker's intent. The model is based on the wav2vec2 architecture and has been fine-tuned on the minds14 dataset.

Model Features

Based on the wav2vec2 architecture

Use the wav2vec2-base model open-sourced by facebook as the basic architecture

Lightweight fine-tuning

Fine-tuned for 10 epochs on the minds14 dataset

Model Capabilities

Audio classification

Speaker intent recognition

Use Cases

Voice interaction

Voice assistant intent recognition

Identify the user's intent expressed through voice

🚀 my_awesome_mind_model

This model is a fine - tuned version of [facebook/wav2vec2 - base](https://huggingface.co/facebook/wav2vec2 - base) on the minds14 dataset. It can be used to classify audio data, specifically for audio classification tasks like identifying speaker intent.

🚀 Quick Start

This model is a fine - tuned version of [facebook/wav2vec2 - base](https://huggingface.co/facebook/wav2vec2 - base) on the minds14 dataset. It achieves the following results on the evaluation set:

Loss: 2.6577
Accuracy: 0.0619

✨ Features

Based on the pre - trained model [facebook/wav2vec2 - base](https://huggingface.co/facebook/wav2vec2 - base), it can be used for audio classification tasks.
Can be used to familiarize users with the process of fine - tuning a pre - trained model.

📦 Installation

! pip install transformers datasets
# To install from source instead of the last release, comment the command above and uncomment the following one.
# ! pip install git+https://github.com/huggingface/transformers.git

💻 Usage Examples

Basic Usage

# Load MInDS - 14 dataset
from datasets import load_dataset, Audio
minds = load_dataset("PolyAI/minds14", name="en - US", split="train")
minds = minds.train_test_split(test_size = 0.2)
minds = minds.remove_columns(["path", "transcription", "english_transcription", "lang_id"])

# Load feature extractor
from transformers import AutoFeatureExtractor
feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/wav2vec2 - base")

# Resample the dataset
minds = minds.cast_column("audio", Audio(sampling_rate = 16_000))

Advanced Usage

# Create label mapping
labels = minds["train"].features["intent_class"].names
label2id, id2label = dict(), dict()
for i, label in enumerate(labels):
    label2id[label] = str(i)
    id2label[str(i)] = label

# Use the model for inference
# First, load the model
from transformers import AutoModelForAudioClassification
model = AutoModelForAudioClassification.from_pretrained("your_model_path", label2id = label2id, id2label = id2label)

# Then, preprocess the audio data
import torch
inputs = feature_extractor(minds["train"][0]["audio"]["array"], sampling_rate = 16000, return_tensors = "pt")

# Make predictions
with torch.no_grad():
    logits = model(**inputs).logits
predicted_class_id = logits.argmax().item()
predicted_label = model.config.id2label[predicted_class_id]
print(predicted_label)

📚 Documentation

Model description

Base Model used for fine - tuning [facebook/wav2vec2 - base](https://huggingface.co/facebook/wav2vec2 - base)

Intended uses & limitations

Use it to familiarize yourself with fine - tuning a pre - trained model. Not a Production Ready Model

Training and evaluation data

This is the link to the training dataset: PolyAI/minds14. You can bring your own data and preprocess it for training.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 3e - 05
train_batch_size: 32
eval_batch_size: 32
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 128
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9, 0.999) and epsilon = 1e - 08 and optimizer_args = No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	0.8	3	2.6463	0.0619
No log	1.8	6	2.6525	0.0442
No log	2.8	9	2.6524	0.0619
3.0286	3.8	12	2.6569	0.0619
3.0286	4.8	15	2.6572	0.0531
3.0286	5.8	18	2.6546	0.0619
3.0109	6.8	21	2.6593	0.0708
3.0109	7.8	24	2.6585	0.0531
3.0109	8.8	27	2.6569	0.0619
3.0047	9.8	30	2.6577	0.0619

Framework versions

Transformers 4.48.2
Pytorch 2.5.1+cu124
Datasets 3.2.0
Tokenizers 0.21.0

🔧 Technical Details

Inference - How to Use

Load an audio file you'd like to run inference on. Remember to resample the sampling rate of the audio file to match the sampling rate of the model if you need to!

Audio classification

Audio classification - just like with text - assigns a class label output from the input data. The only difference is instead of text inputs, you have raw audio waveforms. Some practical applications of audio classification include identifying speaker intent, language classification, and even animal species by their sounds.

This guide will show you how to:

Finetune Wav2Vec2 on the MInDS - 14 dataset to classify speaker intent.
Use your finetuned model for inference.

The task illustrated in this tutorial is supported by the following model architectures: Audio Spectrogram Transformer, Data2VecAudio, Hubert, SEW, SEW - D, UniSpeech, UniSpeechSat, Wav2Vec2, Wav2Vec2 - Conformer, WavLM, Whisper

Before you begin, make sure you have all the necessary libraries installed:

pip install transformers datasets evaluate

We encourage you to login to your Hugging Face account so you can upload and share your model with the community. When prompted, enter your token to login:

📄 License

This model is licensed under the apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご