🚀 Model Card for LatAm Accent Determination
This Wav2Vec2 model is designed to classify audio according to the speaker's accent, distinguishing between Puerto Rican, Colombian, Venezuelan, Peruvian, and Chilean accents.
🚀 Quick Start
To get started with this model, you can refer to the GitHub Repo for more information.
✨ Features
- Classify audio clips into different Latin American Spanish accents.
- Trained on specific datasets to ensure accuracy.
📦 Installation
The README does not provide specific installation steps, so this section is skipped.
💻 Usage Examples
The README does not contain code examples, so this section is skipped.
📚 Documentation
Model Details
Model Description
This is a Wav2Vec2 model used to classify audio based on the speaker's accent as Puerto Rican, Colombian, Venezuelan, Peruvian, or Chilean.
- Developed by: Henry Savich
- Shared by [Optional]: Henry Savich
- Model type: Language model
- Language(s) (NLP): es
- License: openrail
- Parent Model: Wav2Vec2 Base
- Resources for more information:
Uses
Direct Use
Classify an audio clip as Puerto Rican, Peruvian, Venezuelan, Colombian, or Chilean Spanish.
Out-of-Scope Use
The model was trained on speakers reciting pre-chosen sentences, thus it does not reflect any knowledge of lexical differences between dialects.
Bias, Risks, and Limitations
Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups.
Training Details
Training Data
OpenSLR 71,72,73,74,75,76
Training Procedure
Preprocessing
Data was Train-Test split on speakers, so as to prevent the model from achieving high test accuracy by matching voices.
Speeds, Sizes, Times
Trained on ~3000 5 - second audio clips, Training is lightweight taking < 1 hr on using Google Colaboratory Premium GPUs.
Evaluation
Testing Data, Factors & Metrics
Testing Data
OpenSLR 71,72,73,74,75,76
https://huggingface.co/datasets/openslr
Factors
Audio Quality - training and testing data was higher quality than can be expected from found audio.
Metrics
Accuracy
Results
~85% depending on random train - test split.
Model Examination
Even splitting on speakers, our model achieves excellent accuracy on the testing set. This is interesting because it indicates that accent classification, at least at this granularity, is an easier task than voice identification, which could have just as easily met the training objective.
The confusion matrix shows that Basque is the most easily distinguished, which should be expected as it is the only language that isn't Spanish. Puerto Rican was the hardest to identify in the testing set, but I think this is more having to do with PR having the least data moreso than something about the accent itself.
I think if this same size of dataset was used for this same experiment, but there were more speakers (and so not as much fitting on individual voices), we could expect near perfect accuracy.
Technical Specs
Model Architecture and Objective
Wav2Vec2
Compute Infrastructure
Google Colaboratory Pro+
Hardware
Google Colaboratory Pro+ Premium GPUS
Software
Pytorch via huggingface
Citation
The README does not provide citation information, so this section is skipped.
Model Card Authors
Henry Savich
Model Card Contact
henry.h.savich@vanderbilt.edu