My Awesome Mind Model
Model Overview
Model Features
Model Capabilities
Use Cases
đ my_awesome_mind_model
This model is a fine - tuned version of [facebook/wav2vec2 - base](https://huggingface.co/facebook/wav2vec2 - base) on the minds14 dataset. It can be used to classify audio data, specifically for audio classification tasks like identifying speaker intent.
đ Quick Start
This model is a fine - tuned version of [facebook/wav2vec2 - base](https://huggingface.co/facebook/wav2vec2 - base) on the minds14 dataset. It achieves the following results on the evaluation set:
- Loss: 2.6577
- Accuracy: 0.0619
⨠Features
- Based on the pre - trained model [facebook/wav2vec2 - base](https://huggingface.co/facebook/wav2vec2 - base), it can be used for audio classification tasks.
- Can be used to familiarize users with the process of fine - tuning a pre - trained model.
đĻ Installation
! pip install transformers datasets
# To install from source instead of the last release, comment the command above and uncomment the following one.
# ! pip install git+https://github.com/huggingface/transformers.git
đģ Usage Examples
Basic Usage
# Load MInDS - 14 dataset
from datasets import load_dataset, Audio
minds = load_dataset("PolyAI/minds14", name="en - US", split="train")
minds = minds.train_test_split(test_size = 0.2)
minds = minds.remove_columns(["path", "transcription", "english_transcription", "lang_id"])
# Load feature extractor
from transformers import AutoFeatureExtractor
feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/wav2vec2 - base")
# Resample the dataset
minds = minds.cast_column("audio", Audio(sampling_rate = 16_000))
Advanced Usage
# Create label mapping
labels = minds["train"].features["intent_class"].names
label2id, id2label = dict(), dict()
for i, label in enumerate(labels):
label2id[label] = str(i)
id2label[str(i)] = label
# Use the model for inference
# First, load the model
from transformers import AutoModelForAudioClassification
model = AutoModelForAudioClassification.from_pretrained("your_model_path", label2id = label2id, id2label = id2label)
# Then, preprocess the audio data
import torch
inputs = feature_extractor(minds["train"][0]["audio"]["array"], sampling_rate = 16000, return_tensors = "pt")
# Make predictions
with torch.no_grad():
logits = model(**inputs).logits
predicted_class_id = logits.argmax().item()
predicted_label = model.config.id2label[predicted_class_id]
print(predicted_label)
đ Documentation
Model description
Base Model used for fine - tuning [facebook/wav2vec2 - base](https://huggingface.co/facebook/wav2vec2 - base)
Intended uses & limitations
Use it to familiarize yourself with fine - tuning a pre - trained model. Not a Production Ready Model
Training and evaluation data
This is the link to the training dataset: PolyAI/minds14. You can bring your own data and preprocess it for training.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 3e - 05
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 128
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9, 0.999) and epsilon = 1e - 08 and optimizer_args = No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 10
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy |
---|---|---|---|---|
No log | 0.8 | 3 | 2.6463 | 0.0619 |
No log | 1.8 | 6 | 2.6525 | 0.0442 |
No log | 2.8 | 9 | 2.6524 | 0.0619 |
3.0286 | 3.8 | 12 | 2.6569 | 0.0619 |
3.0286 | 4.8 | 15 | 2.6572 | 0.0531 |
3.0286 | 5.8 | 18 | 2.6546 | 0.0619 |
3.0109 | 6.8 | 21 | 2.6593 | 0.0708 |
3.0109 | 7.8 | 24 | 2.6585 | 0.0531 |
3.0109 | 8.8 | 27 | 2.6569 | 0.0619 |
3.0047 | 9.8 | 30 | 2.6577 | 0.0619 |
Framework versions
- Transformers 4.48.2
- Pytorch 2.5.1+cu124
- Datasets 3.2.0
- Tokenizers 0.21.0
đ§ Technical Details
Inference - How to Use
Load an audio file you'd like to run inference on. Remember to resample the sampling rate of the audio file to match the sampling rate of the model if you need to!
Audio classification
Audio classification - just like with text - assigns a class label output from the input data. The only difference is instead of text inputs, you have raw audio waveforms. Some practical applications of audio classification include identifying speaker intent, language classification, and even animal species by their sounds.
This guide will show you how to:
- Finetune Wav2Vec2 on the MInDS - 14 dataset to classify speaker intent.
- Use your finetuned model for inference.
The task illustrated in this tutorial is supported by the following model architectures: Audio Spectrogram Transformer, Data2VecAudio, Hubert, SEW, SEW - D, UniSpeech, UniSpeechSat, Wav2Vec2, Wav2Vec2 - Conformer, WavLM, Whisper
Before you begin, make sure you have all the necessary libraries installed:
pip install transformers datasets evaluate
We encourage you to login to your Hugging Face account so you can upload and share your model with the community. When prompted, enter your token to login:
đ License
This model is licensed under the apache - 2.0 license.







