google_speech_command_xvector开源语音指令识别模型

首页

Google Speech Command Xvector

由 speechbrain 开发

使用SpeechBrain训练的语音指令识别模型，基于谷歌语音指令数据集，可识别12个关键词。

语音识别

PyTorch

英语开源协议:Apache-2.0 #短语音指令识别 #TDNN架构 #高准确率分类

下载量 67

发布时间 : 3/2/2022

模型简介

该系统由TDNN模型结合统计池化构成，顶部应用了分类器，用于在短音频片段中检测单个关键词。

模型特点

高准确率

在测试集上达到98.14%的准确率

轻量级

适用于嵌入式设备和实时应用

多指令支持

可识别12种不同的语音指令

模型能力

语音指令识别

关键词检测

短音频分类

使用案例

智能家居控制

语音控制设备

通过语音指令控制智能家居设备

识别'开'、'关'等指令

车载系统

车载语音控制

通过语音指令控制车载系统

识别'前进'、'停'等指令

🚀 基于xvector嵌入在Google语音命令数据集上进行命令识别

本项目提供了使用预训练于Google语音命令数据集的模型，借助SpeechBrain执行命令识别所需的全部工具。你可以在此下载该数据集。此数据集提供了小型的训练集、验证集和测试集，可用于在短音频片段中检测单个关键词。本系统能够识别以下12个关键词：

'yes', 'no', 'up', 'down', 'left', 'right', 'on', 'off', 'stop', 'go', 'unknown', 'silence'

为获得更好的使用体验，我们建议你进一步了解SpeechBrain。该模型在测试集上的性能表现如下：

版本	准确率(%)
06 - 02 - 21	98.14

🚀 快速开始

本项目提供了使用预训练于Google语音命令数据集的模型，借助SpeechBrain执行命令识别所需的全部工具。你可以在此下载该数据集。

✨ 主要特性

能够识别12个常见关键词，包括 'yes', 'no', 'up' 等。
模型在测试集上达到了98.14%的准确率。
代码会自动对音频进行归一化处理（重采样 + 单声道选择）。

📦 安装指南

首先，请使用以下命令安装SpeechBrain：

pip install speechbrain

建议你阅读相关教程，进一步了解SpeechBrain。

💻 使用示例

基础用法

import torchaudio
from speechbrain.inference.classifiers import EncoderClassifier
classifier = EncoderClassifier.from_hparams(source="speechbrain/google_speech_command_xvector", savedir="pretrained_models/google_speech_command_xvector")
out_prob, score, index, text_lab = classifier.classify_file('speechbrain/google_speech_command_xvector/yes.wav')
print(text_lab)
out_prob, score, index, text_lab = classifier.classify_file('speechbrain/google_speech_command_xvector/stop.wav')
print(text_lab)

高级用法

在GPU上进行推理

若要在GPU上进行推理，在调用 from_hparams 方法时添加 run_opts={"device":"cuda"}。

📚 详细文档

管道描述

本系统由一个与统计池化相结合的TDNN模型组成。在此基础上应用了一个使用分类交叉熵损失训练的分类器。

系统使用采样率为16kHz（单声道）的录音进行训练。调用 classify_file 时，代码会根据需要自动对音频进行归一化处理（即重采样 + 单声道选择）。

训练步骤

若要从头开始训练模型，请遵循以下步骤：

克隆SpeechBrain仓库：

git clone https://github.com/speechbrain/speechbrain/

安装依赖：

cd speechbrain
pip install -r requirements.txt
pip install -e .

运行训练脚本：

cd recipes/Google-speech-commands
python train.py hparams/xvect.yaml --data_folder=your_data_folder

你可以在此找到训练结果（模型、日志等）。

局限性

SpeechBrain团队不保证该模型在其他数据集上的性能表现。

引用说明

引用xvectors

  author    = {David Snyder and
               Daniel Garcia{-}Romero and
               Alan McCree and
               Gregory Sell and
               Daniel Povey and
               Sanjeev Khudanpur},
  title     = {Spoken Language Recognition using X-vectors},
  booktitle = {Odyssey 2018},
  pages     = {105--111},
  year      = {2018},
}

引用Google语音命令数据集

   author = { {Warden}, P.},
    title = "{Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition}",
  journal = {ArXiv e-prints},
  archivePrefix = "arXiv",
  eprint = {1804.03209},
  primaryClass = "cs.CL",
  keywords = {Computer Science - Computation and Language, Computer Science - Human-Computer Interaction},
    year = 2018,
    month = apr,
    url = {https://arxiv.org/abs/1804.03209},
}

📄 许可证

本项目采用Apache 2.0许可证。

关于SpeechBrain

官网：https://speechbrain.github.io/
代码仓库：https://github.com/speechbrain/speechbrain/
HuggingFace页面：https://huggingface.co/speechbrain/

引用SpeechBrain

如果在你的研究或业务中使用了SpeechBrain，请进行引用：

@misc{speechbrain,
  title={{SpeechBrain}: A General-Purpose Speech Toolkit},
  author={Mirco Ravanelli and Titouan Parcollet and Peter Plantinga and Aku Rouhe and Samuele Cornell and Loren Lugosch and Cem Subakan and Nauman Dawalatabad and Abdelwahab Heba and Jianyuan Zhong and Ju-Chieh Chou and Sung-Lin Yeh and Szu-Wei Fu and Chien-Feng Liao and Elena Rastorgueva and François Grondin and William Aris and Hwidong Na and Yan Gao and Renato De Mori and Yoshua Bengio},
  year={2021},
  eprint={2106.04624},
  archivePrefix={arXiv},
  primaryClass={eess.AS},
  note={arXiv:2106.04624}
}