accent_determinator Open-source Audio Classification Model - Free Identification of Multiple Regional Spanish Accents

Accent Determinator

Developed by hhsavich

An audio classification model based on Wav2Vec2 architecture for identifying Puerto Rican, Colombian, Venezuelan, Peruvian, or Chilean Spanish accents

Audio Classification

Transformers

#Spanish accent classification #Wav2Vec2 architecture #Multi-country recognition

Downloads 29

Release Time : 12/7/2022

Model Overview

This model employs the Wav2Vec2 framework, specifically designed for classifying Spanish accents, supporting recognition of five Latin American regional accents

Model Features

Multi-accent recognition

Capable of identifying five Spanish accents: Puerto Rican, Colombian, Venezuelan, Peruvian, and Chilean

Efficient training

Trained with approximately 3,000 five-second audio samples, completed in less than 1 hour using Google Colab Pro+ GPU environment

Anti-voiceprint overfitting

Adopts speaker-separated train-test set division to prevent models from achieving inflated accuracy through voiceprint matching

Model Capabilities

Audio classification

Accent recognition

Spanish speech analysis

Use Cases

Speech analysis

Accent classification

Classify audio clips by accent (Puerto Rican/Peruvian/Venezuelan/Colombian/Chilean Spanish)

Achieves approximately 85% accuracy with random division

Linguistic research

Dialect research

Assist linguists in studying pronunciation characteristics of Spanish across different regions

🚀 Model Card for LatAm Accent Determination

This Wav2Vec2 model is designed to classify audio according to the speaker's accent, distinguishing between Puerto Rican, Colombian, Venezuelan, Peruvian, and Chilean accents.

🚀 Quick Start

To get started with this model, you can refer to the GitHub Repo for more information.

✨ Features

Classify audio clips into different Latin American Spanish accents.
Trained on specific datasets to ensure accuracy.

📦 Installation

The README does not provide specific installation steps, so this section is skipped.

💻 Usage Examples

The README does not contain code examples, so this section is skipped.

📚 Documentation

Model Details

Model Description

This is a Wav2Vec2 model used to classify audio based on the speaker's accent as Puerto Rican, Colombian, Venezuelan, Peruvian, or Chilean.

Developed by: Henry Savich
Shared by [Optional]: Henry Savich
Model type: Language model
Language(s) (NLP): es
License: openrail
Parent Model: Wav2Vec2 Base
Resources for more information:
- GitHub Repo

Uses

Direct Use

Classify an audio clip as Puerto Rican, Peruvian, Venezuelan, Colombian, or Chilean Spanish.

Out-of-Scope Use

The model was trained on speakers reciting pre-chosen sentences, thus it does not reflect any knowledge of lexical differences between dialects.

Bias, Risks, and Limitations

Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups.

Training Details

Training Data

OpenSLR 71,72,73,74,75,76

Training Procedure

Preprocessing

Data was Train-Test split on speakers, so as to prevent the model from achieving high test accuracy by matching voices.

Speeds, Sizes, Times

Trained on ~3000 5 - second audio clips, Training is lightweight taking < 1 hr on using Google Colaboratory Premium GPUs.

Evaluation

Testing Data, Factors & Metrics

Testing Data

OpenSLR 71,72,73,74,75,76 https://huggingface.co/datasets/openslr

Factors

Audio Quality - training and testing data was higher quality than can be expected from found audio.

Metrics

Accuracy

Results

~85% depending on random train - test split.

Model Examination

Even splitting on speakers, our model achieves excellent accuracy on the testing set. This is interesting because it indicates that accent classification, at least at this granularity, is an easier task than voice identification, which could have just as easily met the training objective.

The confusion matrix shows that Basque is the most easily distinguished, which should be expected as it is the only language that isn't Spanish. Puerto Rican was the hardest to identify in the testing set, but I think this is more having to do with PR having the least data moreso than something about the accent itself.

I think if this same size of dataset was used for this same experiment, but there were more speakers (and so not as much fitting on individual voices), we could expect near perfect accuracy.

Technical Specs

Model Architecture and Objective

Wav2Vec2

Compute Infrastructure

Google Colaboratory Pro+

Hardware

Google Colaboratory Pro+ Premium GPUS

Software

Pytorch via huggingface

Citation

The README does not provide citation information, so this section is skipped.

Model Card Authors

Henry Savich

Model Card Contact

henry.h.savich@vanderbilt.edu

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご