Paraformer-large Open-source Speech Recognition Model - Generate Whole Sentence Text in Parallel for More Efficient GPU Inference

Paraformer Large

Developed by funasr

Paraformer is an innovative non-autoregressive end-to-end speech recognition model with significant advantages over traditional autoregressive models. It can generate entire target sentences in parallel, making it particularly suitable for GPU-accelerated parallel inference.

Speech Recognition ChineseOpen Source License:Apache-2.0 #Non-autoregressive speech recognition #Industrial-grade speech transcription #GPU-efficient inference

Downloads 43

Release Time : 4/17/2023

Model Overview

Paraformer is an efficient non-autoregressive end-to-end speech recognition model that achieves performance comparable to autoregressive models on industrial-grade data while significantly improving inference efficiency.

Model Features

Parallel Inference

Capable of generating entire target sentences in parallel, making it particularly suitable for GPU-accelerated parallel inference, significantly improving inference efficiency.

Efficient Inference

Compared to traditional autoregressive models, it can reduce machine costs for speech recognition cloud services by nearly 10 times.

High Performance

Achieves performance comparable to autoregressive models on industrial-grade data.

Industrial Applications

Trained on a 60,000-hour Mandarin dataset, suitable for industrial-grade application scenarios.

Model Capabilities

Mandarin speech recognition

High-precision text conversion

Batch speech processing

Use Cases

Speech Transcription Services

Cloud Speech Recognition Service

Provides efficient speech recognition capabilities for cloud services

Reduces machine costs by nearly 10 times

Intelligent Customer Service

Customer Service Call Analysis

Real-time transcription of customer service calls

🚀 Paraformer with FunASR ONNX

Paraformer is an innovative non - autoregressive end - to - end speech recognition model. This repository shows how to use it with funasr_onnx runtime, offering high - efficiency speech recognition solutions.

🚀 Quick Start

Paraformer is a revolutionary non - autoregressive end - to - end speech recognition model. It can generate the target text for an entire sentence in parallel, greatly enhancing inference efficiency, especially when using GPUs. This repository demonstrates how to utilize Paraformer with the funasr_onnx runtime.

✨ Features

High - Efficiency Inference: Paraformer can generate the target text for an entire sentence in parallel, which is suitable for parallel inference on GPUs, significantly improving inference efficiency and reducing machine costs for speech recognition cloud services by almost 10 times.
Excellent Performance: It can achieve the same performance as autoregressive models on industrial - scale data and ranked first on the SpeechIO leaderboard.
Multiple Industrial - Grade Models: We have released numerous industrial - grade models, including those for speech recognition, voice activity detection, punctuation restoration, speaker verification, speaker diarization, and timestamp prediction (force alignment).

📦 Installation

Install funasr_onnx

pip install -U funasr_onnx
# For the users in China, you could install with the command:
# pip install -U funasr_onnx -i https://mirror.sjtu.edu.cn/pypi/web/simple

Download the model

git clone https://huggingface.co/funasr/paraformer-large

💻 Usage Examples

Basic Usage

Speech Recognition - Paraformer

from funasr_onnx import Paraformer

model_dir = "./paraformer-large"
model = Paraformer(model_dir, batch_size=1, quantize=True)

wav_path = ['./funasr/paraformer-large/asr_example.wav']

result = model(wav_path)
print(result)

model_dir: the model path, which contains model.onnx, config.yaml, am.mvn
batch_size: 1 (Default), the batch size duration inference
device_id: -1 (Default), infer on CPU. If you want to infer with GPU, set it to gpu_id (Please make sure that you have install the onnxruntime - gpu)
quantize: False (Default), load the model of model.onnx in model_dir. If set True, load the model of model_quant.onnx in model_dir
intra_op_num_threads: 4 (Default), sets the number of threads used for intraop parallelism on CPU

Input: wav formt file, support formats: str, np.ndarray, List[str]

Output: List[str]: recognition result

📚 Documentation

Performance benchmark

Please ref to benchmark

📄 License

This project is licensed under the Apache - 2.0 license.

📚 Citations

@inproceedings{gao2022paraformer,
  title={Paraformer: Fast and Accurate Parallel Transformer for Non - autoregressive End - to - End Speech Recognition},
  author={Gao, Zhifu and Zhang, Shiliang and McLoughlin, Ian and Yan, Zhijie},
  booktitle={INTERSPEECH},
  year={2022}
}

Information Table

Property	Details
Model Type	Paraformer, an innovative non - autoregressive end - to - end speech recognition model
Training Data	Trained on a massive 60,000 - hour Mandarin dataset from [FunASR](https://github.com/alibaba - damo - academy/FunASR)
Metrics	Accuracy, CER
Pipeline Tag	Automatic Speech Recognition
Tags	Paraformer, FunASR, ASR

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご