accent - id - commonaccent_xlsr - en - englishオープンソース英語アクセント分類システム

ホーム

Accent Id Commonaccent Xlsr En English

Jzuluagaによって開発

XLSRモデルに基づく高精度な英語アクセント分類システムで、16種類の英語アクセントを識別可能、最高95%の精度を達成

音声分類

PyTorch

英語オープンソースライセンス:MIT #英語アクセント識別 #XLSR事前学習 #多国籍アクセント分類

ダウンロード数 333

リリース時間 : 8/4/2023

モデル概要

このモデルはXLSRアーキテクチャを微調整して英語アクセント分類を実現し、統計プーリング層と線形分類器を統合、アクセント付き英語音声認識タスク専用に設計

モデル特徴

多アクセント識別

アメリカ、イギリス、インドなど主流の16種類の英語アクセント分類をサポート

高精度

CommonAccentデータセットで95%の分類精度を達成

事前学習モデル統合

XLSR大規模音響事前学習モデルを微調整し、強力な特徴抽出能力を有する

階層的クラスタリング

t-SNE可視化により、音韻的類似性に基づいて埋め込みベクトルが自動的にクラスター構造を形成

モデル能力

英語アクセント分類

音声特徴抽出

短時間音声分析

使用事例

音声認識強化

ASRシステムのアクセント適応

自動音声認識システムにアクセント情報を提供し認識誤りを低減

アクセント起因のASR誤り率を低減可能

音声分析

話者特徴分析

アクセント識別を通じて話者の地域的背景特徴を分析

🚀 CommonAccent: CommonVoiceに基づくアクセント分類のための大規模音響事前学習モデルの探索

XLSRモデルを用いた英語アクセント分類器

このプロジェクトは、自動音声認識（ASR）におけるアクセント付き音声の認識問題に対処します。アクセント情報をASRフレームワークに組み込むことで、アクセント付き音声の認識エラーを軽減できることが示されています。本リポジトリでは、SpeechBrainツールキットを用いて、Common Voiceデータセットに基づくアクセント分類を行うためのツールを提供しています。

🚀 クイックスタート

このシステムは、CommonAccentデータセットで事前学習されたモデルを使用して、英語の音声録音からアクセントを識別します。以下の手順に従って、アクセント識別を行うことができます。

必要なライブラリのインストール

まず、SpeechBrainをインストールします。

pip install speechbrain

アクセント識別の実行

以下のPythonコードを使用して、音声録音からアクセントを識別できます。

import torchaudio
from speechbrain.pretrained.interfaces import foreign_class

classifier = foreign_class(source="Jzuluaga/accent-id-commonaccent_xlsr-en-english", pymodule_file="custom_interface.py", classname="CustomEncoderWav2vec2Classifier")

# 米国アクセントの例
out_prob, score, index, text_lab = classifier.classify_file('Jzuluaga/accent-id-commonaccent_xlsr-en-english/data/us.wav')
print(text_lab)

# フィリピンアクセントの例
out_prob, score, index, text_lab = classifier.classify_file('Jzuluaga/accent-id-commonaccent_xlsr-en-english/data/philippines.wav')
print(text_lab)

GPUでの推論

GPUで推論を行うには、from_hparamsメソッドを呼び出す際にrun_opts={"device":"cuda"}を追加します。

学習

このモデルはSpeechBrainを使用して学習されました。以下の手順に従って、モデルを最初から学習できます。

SpeechBrainをクローンします。

git clone https://github.com/speechbrain/speechbrain/

SpeechBrainをインストールします。

cd speechbrain
pip install -r requirements.txt
pip install -e .

リポジトリをクローンし、学習を開始します。

git clone https://github.com/JuanPZuluaga/accent-recog-slt2022
cd CommonAccent/accent_id
python train_w2v2.py hparams/train_w2v2.yaml

✨ 主な機能

多言語アクセント分類：ECAPA - TDNNおよびWav2Vec 2.0/XLSRアーキテクチャを使用して、多言語のアクセント分類を行います。
高い精度：英語のアクセント分類において、最高95％の精度を達成しました。
自動音声正規化：コードは、必要に応じて音声を自動的に正規化（リサンプリング + モノチャンネル選択）します。

📦 インストール

SpeechBrainをインストールするには、以下のコマンドを実行します。

pip install speechbrain

💻 使用例

基本的な使用法

import torchaudio
from speechbrain.pretrained.interfaces import foreign_class

classifier = foreign_class(source="Jzuluaga/accent-id-commonaccent_xlsr-en-english", pymodule_file="custom_interface.py", classname="CustomEncoderWav2vec2Classifier")

# 米国アクセントの例
out_prob, score, index, text_lab = classifier.classify_file('Jzuluaga/accent-id-commonaccent_xlsr-en-english/data/us.wav')
print(text_lab)

# フィリピンアクセントの例
out_prob, score, index, text_lab = classifier.classify_file('Jzuluaga/accent-id-commonaccent_xlsr-en-english/data/philippines.wav')
print(text_lab)

高度な使用法

GPUで推論を行うには、以下のようにrun_opts={"device":"cuda"}を追加します。

import torchaudio
from speechbrain.pretrained.interfaces import foreign_class

classifier = foreign_class(source="Jzuluaga/accent-id-commonaccent_xlsr-en-english", pymodule_file="custom_interface.py", classname="CustomEncoderWav2vec2Classifier", run_opts={"device":"cuda"})

# 音声ファイルの分類
out_prob, score, index, text_lab = classifier.classify_file('Jzuluaga/accent-id-commonaccent_xlsr-en-english/data/us.wav')
print(text_lab)

📚 ドキュメント

パイプラインの説明

このシステムは、微調整されたXLSRモデルと統計的プーリングを組み合わせたものです。その上にNLL損失で学習された分類器が適用されます。システムは、16kHzでサンプリングされた録音（単一チャンネル）で学習されています。

識別可能なアクセント

このシステムは、英語の短い音声録音から以下の16種類のアクセントを識別できます。

- us
- england
- australia
- indian
- canada
- bermuda
- scotland
- african
- ireland
- newzealand
- wales
- malaysia
- philippines
- singapore
- hongkong
- southatlandtic

🔧 技術詳細

このシステムは、CommonAccentデータセットで事前学習されたXLSRモデルを使用しています。ECAPA - TDNNおよびWav2Vec 2.0/XLSRアーキテクチャは、様々な音声関連の下流タスクで良好な性能を発揮することが証明されています。

📄 ライセンス

このプロジェクトはMITライセンスの下で公開されています。

引用

この研究を利用する場合は、以下の文献を引用してください。

CommonAccentの引用

@article{zuluaga2023commonaccent,
  title={CommonAccent: Exploring Large Acoustic Pretrained Models for Accent Classification Based on Common Voice},
  author={Zuluaga-Gomez, Juan and Ahmed, Sara and Visockas, Danielius and Subakan, Cem},
  journal={Interspeech 2023},
  url={https://arxiv.org/abs/2305.18283},
  year={2023}
}

XLSRモデルの引用

@article{conneau2020unsupervised,
  title={Unsupervised cross-lingual representation learning for speech recognition},
  author={Conneau, Alexis and Baevski, Alexei and Collobert, Ronan and Mohamed, Abdelrahman and Auli, Michael},
  journal={arXiv preprint arXiv:2006.13979},
  year={2020}
}

SpeechBrainの引用

@misc{speechbrain,
  title={{SpeechBrain}: A General-Purpose Speech Toolkit},
  author={Mirco Ravanelli and Titouan Parcollet and Peter Plantinga and Aku Rouhe and Samuele Cornell and Loren Lugosch and Cem Subakan and Nauman Dawalatabad and Abdelwahab Heba and Jianyuan Zhong and Ju-Chieh Chou and Sung-Lin Yeh and Szu-Wei Fu and Chien-Feng Liao and Elena Rastorgueva and François Grondin and William Aris and Hwidong Na and Yan Gao and Renato De Mori and Yoshua Bengio},
  year={2021},
  eprint={2106.04624},
  archivePrefix={arXiv},
  primaryClass={eess.AS},
  note={arXiv:2106.04624}
}