Open-source wav2vec2 Animal Sound Classification Model - Free Deployment for Accurate Recognition of 10 Types of Animal Calls

Wav2vec2 Animal Sounds Finetuned Hubert Finetuned Animals

Developed by ardneebwar

Animal sound classification model fine-tuned based on HuBERT architecture, capable of recognizing 10 types of animal calls with 95% accuracy

Audio Classification

Transformers

Open Source License:Apache-2.0 #Animal sound recognition #Bioacoustic analysis #High-precision audio classification

Downloads 555

Release Time : 9/26/2023

Model Overview

This model is specifically designed to recognize animal subcategory sounds in the ESC-50 dataset, including 10 types of sounds such as dog barks and chicken clucks, suitable for bioacoustic monitoring and educational applications

Model Features

High-precision classification

Achieves 95% accuracy on the evaluation set

Multi-category recognition

Supports classification of 10 common animal sounds

Transfer learning optimization

Fine-tuned based on facebook/hubert-base-ls960 pre-trained model

Model Capabilities

Animal sound classification

Audio feature extraction

Environmental sound recognition

Use Cases

Wildlife research

Biodiversity monitoring

Automatically identify animal species in field recordings

Improves monitoring efficiency and reduces manual labeling costs

Educational applications

Animal science education tool

Helps children recognize different animal calls

Enhances interactive learning experience

🚀 hubert-finetuned-animals

A fine - tuned version of facebook/hubert-base-ls960 for animal sound classification.

🚀 Quick Start

Try the model here: Animal Sound Classification Spaces

✨ Features

This model, hubert-finetuned-animals, is a fine - tuned version of facebook/hubert-base-ls960 specifically for the task of animal sound classification. It can recognize distinct animal sounds, such as those of dogs, roosters, pigs, cows, frogs, cats, hens, insects, sheeps, and crows. This can be useful in bioacoustic monitoring, educational tools, and wildlife conservation efforts.

📦 Installation

No specific installation steps are provided in the original README.

💻 Usage Examples

Basic Usage

import librosa
import torch
from transformers import HubertForSequenceClassification, Wav2Vec2FeatureExtractor

# Load the fine-tuned model and feature extractor
model_name = "ardneebwar/wav2vec2-animal-sounds-finetuned-hubert-finetuned-animals"
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(model_name)
model = HubertForSequenceClassification.from_pretrained(model_name)

# Prepare the device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()  # Set the model to evaluation mode

# Function to predict the class of an audio file
def predict_audio_class(audio_file, feature_extractor, model, device):
    # Load and preprocess the audio file
    speech, sr = librosa.load(audio_file, sr=16000)
    input_values = feature_extractor(speech, return_tensors="pt", sampling_rate=16000).input_values
    input_values = input_values.to(device)

    # Predict
    with torch.no_grad():
        logits = model(input_values).logits

    # Get the predicted class ID
    predicted_id = torch.argmax(logits, dim=-1)
    # Convert the predicted ID to the class name
    predicted_class = model.config.id2label[predicted_id.item()]
    
    return predicted_class

# Replace 'path_to_your_new_audio_file.wav' with the actual path to the new audio file
audio_file_path = "path_to_audio_file.wav"
predicted_class = predict_audio_class(audio_file_path, feature_extractor, model, device)
print(f"Predicted class: {predicted_class}")

📚 Documentation

Model description

The HuBERT model, originally trained on large amounts of unlabelled audio data, has been fine - tuned here for a downstream task of animal sound classification. This fine - tuning allows the model to specialize in recognizing distinct animal sounds, which can be particularly useful in applications such as bioacoustic monitoring, educational tools, and more interactive forms of wildlife conservation efforts.

Intended uses & limitations

This model is intended for the classification of specific animal sounds within audio clips. It can be used in software applications related to wildlife research, educational content related to animals, or for entertainment purposes where animal sound recognition is needed.

Limitations

While the model shows high accuracy, it is trained on a limited set of categories from the ESC - 50 dataset, which may not cover all possible animal sounds. The performance can vary significantly with audio quality, background noise, and animal sound variations not represented in the training data.

Training and evaluation data

The model was fine - tuned on a subset of the ESC - 50 dataset, which is a publicly available collection designed for environmental sound classification tasks. This subset specifically includes only the categories relevant to animal sounds. Each category in the dataset contains 40 examples, providing a diverse set of samples for model training and evaluation.

Training procedure

The model was fine - tuned using the following procedure:

Preprocessing: Audio files were converted into spectrograms.
Data Split: The data was split into 70% training, 20% testing sets and 10% validation sets.
Fine - tuning: The model was fine - tuned for 10 epochs on the training set.
Evaluation: The model's performance was evaluated on the validation set after each epoch to monitor improvement and prevent overfitting.

Training hyperparameters

The following hyperparameters were used during training:

Property	Details
learning_rate	5e - 05
train_batch_size	8
eval_batch_size	8
seed	42
optimizer	Adam with betas=(0.9,0.999) and epsilon=1e - 08
lr_scheduler_type	linear
lr_scheduler_warmup_ratio	0.1
num_epochs	10

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
2.1934	1.0	45	2.1765	0.3
2.0239	2.0	90	1.8169	0.45
1.7745	3.0	135	1.4817	0.65
1.3787	4.0	180	1.2497	0.75
1.2168	5.0	225	1.0048	0.85
1.0359	6.0	270	0.9969	0.775
0.7983	7.0	315	0.7467	0.9
0.7466	8.0	360	0.7698	0.85
0.6284	9.0	405	0.6097	0.9
0.8365	10.0	450	0.5596	0.95

Framework versions

Transformers 4.33.2
Pytorch 2.0.1+cu118
Datasets 2.14.5
Tokenizers 0.13.3

Github Repository

Animal Sound Classification

📄 License

This model is licensed under the Apache 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご