google_speech_command_xvectorオープンソース音声コマンド認識モデル

ホーム

Google Speech Command Xvector

speechbrainによって開発

SpeechBrainを使用して訓練された音声指令認識モデルで、グーグル音声指令データセットに基づいており、12個のキーワードを認識できます。

音声認識

PyTorch

英語オープンソースライセンス:Apache-2.0 #短い音声指令の認識 #TDNNアーキテクチャ #高い正解率の分類

ダウンロード数 67

リリース時間 : 3/2/2022

モデル概要

このシステムはTDNNモデルと統計的プーリングを組み合わせて構成され、上部に分類器が適用され、短い音声フラグメント内の単一のキーワードを検出するために使用されます。

モデル特徴

高い正解率

テストセットで98.14%の正解率を達成しました

軽量

組み込みデバイスとリアルタイムアプリケーションに適しています

複数の指令サポート

12種類の異なる音声指令を認識できます

モデル能力

音声指令認識

キーワード検出

短い音声分類

使用事例

スマートホーム制御

音声によるデバイス制御

音声指令でスマートホームデバイスを制御します

「オン」「オフ」などの指令を認識します

車載システム

車載音声制御

音声指令で車載システムを制御します

「前進」「停止」などの指令を認識します

🚀 Google Speech Commandsでのxvector埋め込みによるコマンド認識

このリポジトリは、Google Speech Commandsで事前学習されたモデルを使用して、SpeechBrainを用いたコマンド認識を行うために必要なすべてのツールを提供します。データセットはこちらからダウンロードできます。このデータセットは、短い音声クリップ内の単一のキーワードを検出するのに役立つ小規模なトレーニング、検証、テストセットを提供します。提供されるシステムは、以下の12個のキーワードを認識できます。

'yes', 'no', 'up', 'down', 'left', 'right', 'on', 'off', 'stop', 'go', 'unknown', 'silence'

より良い体験を得るために、SpeechBrainについてもっと学ぶことをおすすめします。与えられたモデルのテストセットでのパフォーマンスは以下の通りです。

リリース	精度(%)
06 - 02 - 21	98.14

🚀 クイックスタート

このシステムは、統計的プーリングと組み合わされたTDNNモデルで構成されています。その上に、カテゴリカル交差エントロピー損失で訓練された分類器が適用されます。

システムは、16kHzでサンプリングされた録音（単チャンネル）で訓練されています。コードは、必要に応じて classify_file を呼び出すときに自動的に音声を正規化します（すなわち、リサンプリング + モノチャンネル選択）。

📦 インストール

まず、以下のコマンドでSpeechBrainをインストールしてください。

pip install speechbrain

より詳しい情報を得るために、SpeechBrainについて学ぶことをおすすめします。

💻 使用例

基本的な使用法

import torchaudio
from speechbrain.inference.classifiers import EncoderClassifier
classifier = EncoderClassifier.from_hparams(source="speechbrain/google_speech_command_xvector", savedir="pretrained_models/google_speech_command_xvector")
out_prob, score, index, text_lab = classifier.classify_file('speechbrain/google_speech_command_xvector/yes.wav')
print(text_lab)
out_prob, score, index, text_lab = classifier.classify_file('speechbrain/google_speech_command_xvector/stop.wav')
print(text_lab)

高度な使用法

GPUで推論を行うには、from_hparams メソッドを呼び出すときに run_opts={"device":"cuda"} を追加します。

訓練方法

このモデルはSpeechBrain (b7ff9dc4) で訓練されています。最初から訓練するには、以下の手順に従ってください。

SpeechBrainをクローンします。

git clone https://github.com/speechbrain/speechbrain/

インストールします。

cd speechbrain
pip install -r requirements.txt
pip install -e .

訓練を実行します。

cd recipes/Google - speech - commands
python train.py hparams/xvect.yaml --data_folder=your_data_folder

訓練結果（モデル、ログなど）はこちらで見ることができます。

🔧 技術詳細

📄 ライセンス

このプロジェクトは、Apache 2.0ライセンスの下で提供されています。

参照情報

xvectorの参照

  author    = {David Snyder and
               Daniel Garcia{-}Romero and
               Alan McCree and
               Gregory Sell and
               Daniel Povey and
               Sanjeev Khudanpur},
  title     = {Spoken Language Recognition using X - vectors},
  booktitle = {Odyssey 2018},
  pages     = {105--111},
  year      = {2018},
}

Google Speech Commandsの参照

   author = { {Warden}, P.},
    title = "{Speech Commands: A Dataset for Limited - Vocabulary Speech Recognition}",
  journal = {ArXiv e - prints},
  archivePrefix = "arXiv",
  eprint = {1804.03209},
  primaryClass = "cs.CL",
  keywords = {Computer Science - Computation and Language, Computer Science - Human - Computer Interaction},
    year = 2018,
    month = apr,
    url = {https://arxiv.org/abs/1804.03209},
}

👀 SpeechBrainについて

ウェブサイト: https://speechbrain.github.io/
コード: https://github.com/speechbrain/speechbrain/
HuggingFace: https://huggingface.co/speechbrain/

📝 SpeechBrainを引用する場合

研究やビジネスでSpeechBrainを使用する場合は、以下のように引用してください。

@misc{speechbrain,
  title={{SpeechBrain}: A General - Purpose Speech Toolkit},
  author={Mirco Ravanelli and Titouan Parcollet and Peter Plantinga and Aku Rouhe and Samuele Cornell and Loren Lugosch and Cem Subakan and Nauman Dawalatabad and Abdelwahab Heba and Jianyuan Zhong and Ju - Chieh Chou and Sung - Lin Yeh and Szu - Wei Fu and Chien - Feng Liao and Elena Rastorgueva and François Grondin and William Aris and Hwidong Na and Yan Gao and Renato De Mori and Yoshua Bengio},
  year={2021},
  eprint={2106.04624},
  archivePrefix={arXiv},
  primaryClass={eess.AS},
  note={arXiv:2106.04624}
}