๐ CommonAccent: Exploring Large Acoustic Pretrained Models for Accent Classification Based on CommonVoice
English Accent Classifier with XLSR model
This project addresses the problem of accented speech recognition in Automatic Speech Recognition (ASR) systems. By integrating accent information, it aims to mitigate errors caused by accented speech. It uses ECAPA - TDNN and Wav2Vec 2.0/XLSR architectures for multilingual accent classification and provides tools for English accent classification with high accuracy.
๐ Quick Start
This repository provides all the necessary tools to perform accent identification from speech recordings with SpeechBrain. The system uses a model pretrained on the CommonAccent dataset in English (16 accents).
โจ Features
- Multilingual Accent Classification: Addresses multilingual accent classification through ECAPA - TDNN and Wav2Vec 2.0/XLSR architectures.
- High Accuracy: Establishes new state - of - the - art for English accent classification with as high as 95% accuracy.
- Simple Recipe: Introduces a simple - to - follow recipe aligned to the SpeechBrain toolkit for accent classification based on Common Voice datasets.
๐ฆ Installation
Install SpeechBrain
First of all, please install SpeechBrain with the following command:
pip install speechbrain
Please notice that we encourage you to read our tutorials and learn more about SpeechBrain.
๐ป Usage Examples
Perform Accent Identification from Speech Recordings
import torchaudio
from speechbrain.pretrained.interfaces import foreign_class
classifier = foreign_class(source="Jzuluaga/accent-id-commonaccent_xlsr-en-english", pymodule_file="custom_interface.py", classname="CustomEncoderWav2vec2Classifier")
out_prob, score, index, text_lab = classifier.classify_file('Jzuluaga/accent-id-commonaccent_xlsr-en-english/data/us.wav')
print(text_lab)
out_prob, score, index, text_lab = classifier.classify_file('Jzuluaga/accent-id-commonaccent_xlsr-en-english/data/philippines.wav')
print(text_lab)
Inference on GPU
To perform inference on the GPU, add run_opts={"device":"cuda"}
when calling the from_hparams
method.
๐ Documentation
Training
The model was trained with SpeechBrain. To train it from scratch follow these steps:
- Clone SpeechBrain:
git clone https://github.com/speechbrain/speechbrain/
- Install it:
cd speechbrain
pip install -r requirements.txt
pip install -e .
- Clone our repository in https://github.com/JuanPZuluaga/accent-recog-slt2022:
git clone https://github.com/JuanPZuluaga/accent-recog-slt2022
cd CommonAccent/accent_id
python train_w2v2.py hparams/train_w2v2.yaml
You can find our training results (models, logs, etc) in this repository's Files and versions
page.
Limitations
The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
๐ง Technical Details
Pipeline description
This system is composed of a fine - tuned XLSR model coupled with statistical pooling. A classifier, trained with NLL Loss, is applied on top of that.
The system is trained with recordings sampled at 16kHz (single channel). The code will automatically normalize your audio (i.e., resampling + mono channel selection) when calling classify_file if needed. Make sure your input tensor is compliant with the expected sampling rate if you use encode_batch and classify_batch.
๐ License
This project is licensed under the MIT license.
๐ Citing
Cite our work: CommonAccent
If you find useful this work, please cite our work as:
@article{zuluaga2023commonaccent,
title={CommonAccent: Exploring Large Acoustic Pretrained Models for Accent Classification Based on Common Voice},
author={Zuluaga - Gomez, Juan and Ahmed, Sara and Visockas, Danielius and Subakan, Cem},
journal={Interspeech 2023},
url={https://arxiv.org/abs/2305.18283},
year={2023}
}
Cite XLSR model
@article{conneau2020unsupervised,
title={Unsupervised cross - lingual representation learning for speech recognition},
author={Conneau, Alexis and Baevski, Alexei and Collobert, Ronan and Mohamed, Abdelrahman and Auli, Michael},
journal={arXiv preprint arXiv:2006.13979},
year={2020}
}
Cite SpeechBrain
Please, cite SpeechBrain if you use it for your research or business.
@misc{speechbrain,
title={{SpeechBrain}: A General - Purpose Speech Toolkit},
author={Mirco Ravanelli and Titouan Parcollet and Peter Plantinga and Aku Rouhe and Samuele Cornell and Loren Lugosch and Cem Subakan and Nauman Dawalatabad and Abdelwahab Heba and Jianyuan Zhong and Ju - Chieh Chou and Sung - Lin Yeh and Szu - Wei Fu and Chien - Feng Liao and Elena Rastorgueva and Franรงois Grondin and William Aris and Hwidong Na and Yan Gao and Renato De Mori and Yoshua Bengio},
year={2021},
eprint={2106.04624},
archivePrefix={arXiv},
primaryClass={eess.AS},
note={arXiv:2106.04624}
}