đ hubert-finetuned-animals
A fine - tuned version of facebook/hubert-base-ls960
for animal sound classification.
đ Quick Start
Try the model here:
Animal Sound Classification Spaces
⨠Features
This model, hubert-finetuned-animals
, is a fine - tuned version of facebook/hubert-base-ls960
specifically for the task of animal sound classification. It can recognize distinct animal sounds, such as those of dogs, roosters, pigs, cows, frogs, cats, hens, insects, sheeps, and crows. This can be useful in bioacoustic monitoring, educational tools, and wildlife conservation efforts.
đĻ Installation
No specific installation steps are provided in the original README.
đģ Usage Examples
Basic Usage
import librosa
import torch
from transformers import HubertForSequenceClassification, Wav2Vec2FeatureExtractor
model_name = "ardneebwar/wav2vec2-animal-sounds-finetuned-hubert-finetuned-animals"
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(model_name)
model = HubertForSequenceClassification.from_pretrained(model_name)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()
def predict_audio_class(audio_file, feature_extractor, model, device):
speech, sr = librosa.load(audio_file, sr=16000)
input_values = feature_extractor(speech, return_tensors="pt", sampling_rate=16000).input_values
input_values = input_values.to(device)
with torch.no_grad():
logits = model(input_values).logits
predicted_id = torch.argmax(logits, dim=-1)
predicted_class = model.config.id2label[predicted_id.item()]
return predicted_class
audio_file_path = "path_to_audio_file.wav"
predicted_class = predict_audio_class(audio_file_path, feature_extractor, model, device)
print(f"Predicted class: {predicted_class}")
đ Documentation
Model description
The HuBERT model, originally trained on large amounts of unlabelled audio data, has been fine - tuned here for a downstream task of animal sound classification. This fine - tuning allows the model to specialize in recognizing distinct animal sounds, which can be particularly useful in applications such as bioacoustic monitoring, educational tools, and more interactive forms of wildlife conservation efforts.
Intended uses & limitations
This model is intended for the classification of specific animal sounds within audio clips. It can be used in software applications related to wildlife research, educational content related to animals, or for entertainment purposes where animal sound recognition is needed.
Limitations
While the model shows high accuracy, it is trained on a limited set of categories from the ESC - 50 dataset, which may not cover all possible animal sounds. The performance can vary significantly with audio quality, background noise, and animal sound variations not represented in the training data.
Training and evaluation data
The model was fine - tuned on a subset of the ESC - 50 dataset, which is a publicly available collection designed for environmental sound classification tasks. This subset specifically includes only the categories relevant to animal sounds. Each category in the dataset contains 40 examples, providing a diverse set of samples for model training and evaluation.
Training procedure
The model was fine - tuned using the following procedure:
- Preprocessing: Audio files were converted into spectrograms.
- Data Split: The data was split into 70% training, 20% testing sets and 10% validation sets.
- Fine - tuning: The model was fine - tuned for 10 epochs on the training set.
- Evaluation: The model's performance was evaluated on the validation set after each epoch to monitor improvement and prevent overfitting.
Training hyperparameters
The following hyperparameters were used during training:
Property |
Details |
learning_rate |
5e - 05 |
train_batch_size |
8 |
eval_batch_size |
8 |
seed |
42 |
optimizer |
Adam with betas=(0.9,0.999) and epsilon=1e - 08 |
lr_scheduler_type |
linear |
lr_scheduler_warmup_ratio |
0.1 |
num_epochs |
10 |
Training results
Training Loss |
Epoch |
Step |
Validation Loss |
Accuracy |
2.1934 |
1.0 |
45 |
2.1765 |
0.3 |
2.0239 |
2.0 |
90 |
1.8169 |
0.45 |
1.7745 |
3.0 |
135 |
1.4817 |
0.65 |
1.3787 |
4.0 |
180 |
1.2497 |
0.75 |
1.2168 |
5.0 |
225 |
1.0048 |
0.85 |
1.0359 |
6.0 |
270 |
0.9969 |
0.775 |
0.7983 |
7.0 |
315 |
0.7467 |
0.9 |
0.7466 |
8.0 |
360 |
0.7698 |
0.85 |
0.6284 |
9.0 |
405 |
0.6097 |
0.9 |
0.8365 |
10.0 |
450 |
0.5596 |
0.95 |
Framework versions
- Transformers 4.33.2
- Pytorch 2.0.1+cu118
- Datasets 2.14.5
- Tokenizers 0.13.3
Github Repository
Animal Sound Classification
đ License
This model is licensed under the Apache 2.0 license.