accent-id-commonaccent_xlsr-en-english Open-source English Accent Classification System

Accent Id Commonaccent Xlsr En English

Developed by Jzuluaga

High-precision English accent classification system based on the XLSR model, supporting recognition of 16 English accents with accuracy up to 95%

Audio Classification

PyTorch

EnglishOpen Source License:MIT #English accent recognition #XLSR pre-training #Multinational accent classification

Downloads 333

Release Time : 8/4/2023

Model Overview

This model achieves English accent classification by fine-tuning the XLSR architecture, integrating statistical pooling layers and linear classifiers, specifically designed for accent-aware English speech recognition tasks

Model Features

Multi-accent recognition

Supports classification of 16 English accents, including mainstream accents such as American, British, and Indian

High accuracy

Achieves 95% classification accuracy on the CommonAccent dataset

Pre-trained model integration

Fine-tuned based on the XLSR large-scale acoustic pre-trained model, featuring powerful feature extraction capabilities

Hierarchical clustering

t-SNE visualization shows that embedding vectors automatically form cluster structures based on phonetic similarity

Model Capabilities

English accent classification

Speech feature extraction

Short speech analysis

Use Cases

Speech recognition enhancement

ASR system accent adaptation

Provides accent information to automatic speech recognition systems to reduce recognition errors

Can reduce ASR error rates caused by accents

Speech analysis

Speaker characteristic analysis

Analyzes the speaker's regional background features through accent recognition

🚀 CommonAccent: Exploring Large Acoustic Pretrained Models for Accent Classification Based on CommonVoice

English Accent Classifier with XLSR model

This project addresses the problem of accented speech recognition in Automatic Speech Recognition (ASR) systems. By integrating accent information, it aims to mitigate errors caused by accented speech. It uses ECAPA - TDNN and Wav2Vec 2.0/XLSR architectures for multilingual accent classification and provides tools for English accent classification with high accuracy.

🚀 Quick Start

This repository provides all the necessary tools to perform accent identification from speech recordings with SpeechBrain. The system uses a model pretrained on the CommonAccent dataset in English (16 accents).

✨ Features

Multilingual Accent Classification: Addresses multilingual accent classification through ECAPA - TDNN and Wav2Vec 2.0/XLSR architectures.
High Accuracy: Establishes new state - of - the - art for English accent classification with as high as 95% accuracy.
Simple Recipe: Introduces a simple - to - follow recipe aligned to the SpeechBrain toolkit for accent classification based on Common Voice datasets.

📦 Installation

Install SpeechBrain

First of all, please install SpeechBrain with the following command:

pip install speechbrain

Please notice that we encourage you to read our tutorials and learn more about SpeechBrain.

💻 Usage Examples

Perform Accent Identification from Speech Recordings

import torchaudio
from speechbrain.pretrained.interfaces import foreign_class

classifier = foreign_class(source="Jzuluaga/accent-id-commonaccent_xlsr-en-english", pymodule_file="custom_interface.py", classname="CustomEncoderWav2vec2Classifier")

# US Accent Example
out_prob, score, index, text_lab = classifier.classify_file('Jzuluaga/accent-id-commonaccent_xlsr-en-english/data/us.wav')
print(text_lab)

# Philippines Example
out_prob, score, index, text_lab = classifier.classify_file('Jzuluaga/accent-id-commonaccent_xlsr-en-english/data/philippines.wav')
print(text_lab)

Inference on GPU

To perform inference on the GPU, add run_opts={"device":"cuda"} when calling the from_hparams method.

📚 Documentation

Training

The model was trained with SpeechBrain. To train it from scratch follow these steps:

Clone SpeechBrain:

git clone https://github.com/speechbrain/speechbrain/

Install it:

cd speechbrain
pip install -r requirements.txt
pip install -e .

Clone our repository in https://github.com/JuanPZuluaga/accent-recog-slt2022:

git clone https://github.com/JuanPZuluaga/accent-recog-slt2022
cd CommonAccent/accent_id
python train_w2v2.py hparams/train_w2v2.yaml

You can find our training results (models, logs, etc) in this repository's Files and versions page.

Limitations

The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.

🔧 Technical Details

Pipeline description

This system is composed of a fine - tuned XLSR model coupled with statistical pooling. A classifier, trained with NLL Loss, is applied on top of that.

The system is trained with recordings sampled at 16kHz (single channel). The code will automatically normalize your audio (i.e., resampling + mono channel selection) when calling classify_file if needed. Make sure your input tensor is compliant with the expected sampling rate if you use encode_batch and classify_batch.

📄 License

This project is licensed under the MIT license.

📝 Citing

Cite our work: CommonAccent

If you find useful this work, please cite our work as:

@article{zuluaga2023commonaccent,
  title={CommonAccent: Exploring Large Acoustic Pretrained Models for Accent Classification Based on Common Voice},
  author={Zuluaga - Gomez, Juan and Ahmed, Sara and Visockas, Danielius and Subakan, Cem},
  journal={Interspeech 2023},
  url={https://arxiv.org/abs/2305.18283},
  year={2023}
}

Cite XLSR model

@article{conneau2020unsupervised,
  title={Unsupervised cross - lingual representation learning for speech recognition},
  author={Conneau, Alexis and Baevski, Alexei and Collobert, Ronan and Mohamed, Abdelrahman and Auli, Michael},
  journal={arXiv preprint arXiv:2006.13979},
  year={2020}
}

Cite SpeechBrain

Please, cite SpeechBrain if you use it for your research or business.

@misc{speechbrain,
  title={{SpeechBrain}: A General - Purpose Speech Toolkit},
  author={Mirco Ravanelli and Titouan Parcollet and Peter Plantinga and Aku Rouhe and Samuele Cornell and Loren Lugosch and Cem Subakan and Nauman Dawalatabad and Abdelwahab Heba and Jianyuan Zhong and Ju - Chieh Chou and Sung - Lin Yeh and Szu - Wei Fu and Chien - Feng Liao and Elena Rastorgueva and François Grondin and William Aris and Hwidong Na and Yan Gao and Renato De Mori and Yoshua Bengio},
  year={2021},
  eprint={2106.04624},
  archivePrefix={arXiv},
  primaryClass={eess.AS},
  note={arXiv:2106.04624}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご