đ Paraformer with FunASR ONNX
Paraformer is an innovative non - autoregressive end - to - end speech recognition model. This repository shows how to use it with funasr_onnx runtime, offering high - efficiency speech recognition solutions.
đ Quick Start
Paraformer is a revolutionary non - autoregressive end - to - end speech recognition model. It can generate the target text for an entire sentence in parallel, greatly enhancing inference efficiency, especially when using GPUs. This repository demonstrates how to utilize Paraformer with the funasr_onnx runtime.
⨠Features
- High - Efficiency Inference: Paraformer can generate the target text for an entire sentence in parallel, which is suitable for parallel inference on GPUs, significantly improving inference efficiency and reducing machine costs for speech recognition cloud services by almost 10 times.
- Excellent Performance: It can achieve the same performance as autoregressive models on industrial - scale data and ranked first on the SpeechIO leaderboard.
- Multiple Industrial - Grade Models: We have released numerous industrial - grade models, including those for speech recognition, voice activity detection, punctuation restoration, speaker verification, speaker diarization, and timestamp prediction (force alignment).
đĻ Installation
Install funasr_onnx
pip install -U funasr_onnx
# For the users in China, you could install with the command:
# pip install -U funasr_onnx -i https://mirror.sjtu.edu.cn/pypi/web/simple
Download the model
git clone https://huggingface.co/funasr/paraformer-large
đģ Usage Examples
Basic Usage
Speech Recognition - Paraformer
from funasr_onnx import Paraformer
model_dir = "./paraformer-large"
model = Paraformer(model_dir, batch_size=1, quantize=True)
wav_path = ['./funasr/paraformer-large/asr_example.wav']
result = model(wav_path)
print(result)
model_dir
: the model path, which contains model.onnx
, config.yaml
, am.mvn
batch_size
: 1
(Default), the batch size duration inference
device_id
: -1
(Default), infer on CPU. If you want to infer with GPU, set it to gpu_id (Please make sure that you have install the onnxruntime - gpu)
quantize
: False
(Default), load the model of model.onnx
in model_dir
. If set True
, load the model of model_quant.onnx
in model_dir
intra_op_num_threads
: 4
(Default), sets the number of threads used for intraop parallelism on CPU
Input: wav formt file, support formats: str, np.ndarray, List[str]
Output: List[str]
: recognition result
đ Documentation
Performance benchmark
Please ref to benchmark
đ License
This project is licensed under the Apache - 2.0 license.
đ Citations
@inproceedings{gao2022paraformer,
title={Paraformer: Fast and Accurate Parallel Transformer for Non - autoregressive End - to - End Speech Recognition},
author={Gao, Zhifu and Zhang, Shiliang and McLoughlin, Ian and Yan, Zhijie},
booktitle={INTERSPEECH},
year={2022}
}
Information Table
Property |
Details |
Model Type |
Paraformer, an innovative non - autoregressive end - to - end speech recognition model |
Training Data |
Trained on a massive 60,000 - hour Mandarin dataset from [FunASR](https://github.com/alibaba - damo - academy/FunASR) |
Metrics |
Accuracy, CER |
Pipeline Tag |
Automatic Speech Recognition |
Tags |
Paraformer, FunASR, ASR |