accent-id-commonaccent開源模型 - 免費實現英語語音16種口音分類，準確率87%

首頁

Accent Id Commonaccent Ecapa

由Jzuluaga開發

該模型使用ECAPA-TDNN架構對英語語音進行16種口音分類，在CommonAccent數據集上訓練，測試準確率達87%。

音頻分類

PyTorch

英語開源協議:MIT #英語口音識別 #ECAPA-TDNN架構 #多口音分類

下載量 2,291

發布時間 : 1/8/2023

模型概述

這是一個語音口音識別模型，能夠從英語語音錄音中識別16種不同的口音。模型基於ECAPA-TDNN架構，在CommonAccent數據集上訓練，可用於提高自動語音識別系統對帶口音語音的處理能力。

模型特點

高準確率

在測試集上達到87%的準確率，優於基線模型

多口音支持

支持16種不同英語口音的識別

數據增強

使用數據增強技術提高模型泛化能力

遷移學習

基於VoxCeleb預訓練模型進行微調

模型能力

語音分類

口音識別

英語語音處理

使用案例

語音識別增強

提高ASR系統對帶口音語音的識別率

通過識別說話者口音，優化ASR系統的語音識別參數

可顯著提高自動語音識別系統對帶口音語音的理解能力

語音分析

說話者口音分析

分析語音樣本中的口音特徵

可用於語言學研究或用戶畫像分析

🚀 使用ECAPA - TDNN嵌入在CommonAccent上進行語音錄音口音識別

本項目藉助 SpeechBrain 實現語音錄音中的口音識別。系統採用在CommonAccent英語數據集（包含16種口音）上預訓練的模型，基於CommonLanguage Recipe（詳見：https://github.com/speechbrain/speechbrain/tree/develop/recipes/CommonLanguage ）構建。該系統能有效識別英語語音中的16種不同口音，為語音識別系統對不同口音的包容性提供了有力支持。

🚀 快速開始

本倉庫提供了使用 SpeechBrain 進行語音錄音口音識別所需的所有工具。系統使用在CommonAccent英語數據集（16種口音）上預訓練的模型，基於位於此處的CommonLanguage Recipe構建：https://github.com/speechbrain/speechbrain/tree/develop/recipes/CommonLanguage 。

提供的系統可以從英語（EN）的短語音記錄中識別以下16種口音：

african
australia
bermuda
canada
england
hongkong
indian
ireland
malaysia
newzealand
philippines
scotland
singapore
southatlandtic
us
wales

GitHub倉庫鏈接：https://github.com/JuanPZuluaga/accent-recog-slt2022

為獲得更好的體驗，建議您進一步瞭解 SpeechBrain 。該模型在測試集上的性能如下：

發佈日期 (dd/mm/yyyy)	準確率 (%)
01 - 08 - 2023（本模型）	87
01 - 08 - 2023（本模型未使用數據增強訓練）	85
01 - 08 - 2023（本模型從頭開始訓練，無參數遷移）	82

✨ 主要特性

先進架構：採用Emphasized Channel Attention, Propagation and Aggregation Time Delay Neural Network (ECAPA - TDNN) 架構，在多種語音任務中表現出色。
多模型對比：提出三種模型，包括從頭訓練的模型、使用數據增強微調的模型和基線模型，通過對比實驗得出使用數據增強微調的模型效果最佳。
聚類分析：通過t - SNE降維技術探索嵌入的內部分類，發現基於語音相似性存在一定的聚類現象。

📦 安裝指南

安裝SpeechBrain

首先，請使用以下命令安裝SpeechBrain：

pip install speechbrain

建議您閱讀相關教程，進一步瞭解 SpeechBrain 。

💻 使用示例

基礎用法

import torchaudio
from speechbrain.pretrained import EncoderClassifier
classifier = EncoderClassifier.from_hparams(source="Jzuluaga/accent-id-commonaccent_ecapa", savedir="pretrained_models/accent-id-commonaccent_ecapa")
# 愛爾蘭口音示例
out_prob, score, index, text_lab = classifier.classify_file('Jzuluaga/accent-id-commonaccent_ecapa/data/ireland_1.wav')
print(text_lab)

# 馬來西亞口音示例
out_prob, score, index, text_lab = classifier.classify_file('Jzuluaga/accent-id-commonaccent_ecapa/data/malaysia_1.wav')
print(text_lab)

高級用法

在GPU上進行推理

若要在GPU上進行推理，在調用 from_hparams 方法時添加 run_opts={"device":"cuda"} 。

📚 詳細文檔

摘要

帶口音語音的識別在自動語音識別（ASR）系統中仍然是一個主要問題。本項目通過Emphasized Channel Attention, Propagation and Aggregation Time Delay Neural Network (ECAPA - TDNN) 架構來處理帶口音英語語音的分類問題，該架構已在多種語音任務中表現良好。提出了三種模型：一種是從頭開始訓練的模型，另外兩種（一種使用數據增強，一種是基線模型）是從speechbrain/spkrec - ecapa - voxceleb (VoxCeleb) 的檢查點進行微調的模型。結果表明，使用數據增強進行微調的模型取得了最佳效果。由於口音相似性，大多數誤分類是有規律且可預期的，例如美國口音和加拿大口音。還通過t - SNE（一種降維技術）探索了嵌入的內部分類，發現基於語音相似性存在一定的聚類現象。未來，計劃在建議的框架中探索該口音分類系統的實現，以提高ASR系統對帶口音語音的包容性，從而提升其性能。

管道描述

該系統由一個與統計池化相結合的ECAPA模型組成。在此基礎上應用一個使用分類交叉熵損失訓練的分類器。

系統使用以16kHz採樣（單聲道）的錄音進行訓練。如果需要，調用 classify_file 時代碼會自動對音頻進行歸一化處理（即重採樣 + 單聲道選擇）。如果使用 encode_batch 和 classify_batch ，請確保輸入張量符合預期的採樣率。

訓練

該模型使用SpeechBrain進行訓練。若要從頭開始訓練，請遵循以下步驟：

克隆SpeechBrain：

git clone https://github.com/speechbrain/speechbrain/

安裝：

cd speechbrain
pip install -r requirements.txt
pip install -e .

克隆項目倉庫：

git clone https://github.com/JuanPZuluaga/accent-recog-slt2022
cd CommonAccent/accent_id
python train.py hparams/train_ecapa_tdnn.yaml

可以在本倉庫的 Files and versions 頁面找到訓練結果（模型、日誌等）。

侷限性

SpeechBrain團隊不對該模型在其他數據集上的性能提供任何保證。

🔧 技術細節

本系統基於ECAPA - TDNN架構，結合統計池化和分類器，使用分類交叉熵損失進行訓練。通過在CommonAccent數據集上的預訓練，能夠對英語語音中的16種口音進行有效識別。在訓練過程中，探索了不同的訓練策略，包括從頭訓練和微調，並對比了使用數據增強和不使用數據增強的效果。還通過t - SNE技術對嵌入的內部結構進行了分析，發現了基於語音相似性的聚類現象。

📄 許可證

本項目採用MIT許可證。

引用說明

引用CommonAccent工作

如果您覺得本工作有幫助，請按以下格式引用：

@article{zuluaga2023commonaccent,
  title={CommonAccent: Exploring Large Acoustic Pretrained Models for Accent Classification Based on Common Voice},
  author={Zuluaga - Gomez, Juan and Ahmed, Sara and Visockas, Danielius and Subakan, Cem},
  journal={Interspeech 2023},
  url={https://arxiv.org/abs/2305.18283},
  year={2023}
}

引用ECAPA - TDNN模型

@inproceedings{DBLP:conf/interspeech/DesplanquesTD20,
  author    = {Brecht Desplanques and
               Jenthe Thienpondt and
               Kris Demuynck},
  editor    = {Helen Meng and
               Bo Xu and
               Thomas Fang Zheng},
  title     = {{ECAPA - TDNN:} Emphasized Channel Attention, Propagation and Aggregation
               in {TDNN} Based Speaker Verification},
  booktitle = {Interspeech 2020},
  pages     = {3830--3834},
  publisher = {{ISCA}},
  year      = {2020},
}

引用SpeechBrain

如果您在研究或業務中使用了SpeechBrain，請按以下格式引用：

@misc{speechbrain,
  title={{SpeechBrain}: A General - Purpose Speech Toolkit},
  author={Mirco Ravanelli and Titouan Parcollet and Peter Plantinga and Aku Rouhe and Samuele Cornell and Loren Lugosch and Cem Subakan and Nauman Dawalatabad and Abdelwahab Heba and Jianyuan Zhong and Ju - Chieh Chou and Sung - Lin Yeh and Szu - Wei Fu and Chien - Feng Liao and Elena Rastorgueva and François Grondin and William Aris and Hwidong Na and Yan Gao and Renato De Mori and Yoshua Bengio},
  year={2021},
  eprint={2106.04624},
  archivePrefix={arXiv},
  primaryClass={eess.AS},
  note={arXiv:2106.04624}
}