wav2vec2-large-xlsr-53-chinese Speech Model - Open source and free, supporting Chinese speech recognition

Wav2vec2 Large Xlsr 53 Chinese Zn Cn Aishell1

Developed by qinyue

A Chinese speech recognition model fine-tuned on the AISHELL-1 dataset based on facebook/wav2vec2-large-xlsr-53, supporting Chinese speech recognition tasks.

Speech Recognition

Transformers

ChineseOpen Source License:Apache-2.0 #Chinese Speech Recognition #Low WER #No Language Model Dependency

Downloads 22

Release Time : 6/16/2022

Model Overview

This model is an automatic speech recognition (ASR) model specifically optimized for Chinese speech, capable of converting Chinese speech into text.

Model Features

Chinese Speech Recognition

A recognition model specifically optimized for Chinese speech, performing excellently on the AISHELL-1 dataset.

No Language Model Required

Can be used directly without additional language model support.

High Accuracy

Achieves a word error rate (WER) of 7.04% on the AISHELL-1 test set, which can be reduced to 3.96% with a language model.

Model Capabilities

Chinese Speech Recognition

16kHz Sampling Rate Audio Processing

Use Cases

Speech Transcription

Meeting Minutes

Automatically convert meeting recordings into text transcripts

Accuracy up to 92.96% (WER 7.04%)

Voice Assistant

Used for human-computer interaction in Chinese voice assistants

Speech Analysis

Speech Content Analysis

Analyze keywords and topics in speech content

🚀 Wav2Vec2-Large-XLSR-53-Chinese-zh-CN-aishell1

This model is fine-tuned from facebook/wav2vec2-large-xlsr-53 on Chinese using the AISHELL-1 dataset. It's designed for automatic speech recognition tasks.

🚀 Quick Start

Fine-tuned facebook/wav2vec2-large-xlsr-53 on Chinese using the AISHELL-1 dataset. When using this model, make sure that your speech input is sampled at 16kHz.

✨ Features

Dataset: Fine-tuned on the AISHELL-1 dataset for Chinese speech recognition.
Metrics: Evaluated using Word Error Rate (WER).
Compatibility: Requires speech input sampled at 16kHz.

📦 Installation

No specific installation steps are provided in the original README. If you want to use this model, you need to install the necessary libraries such as torch, librosa, and transformers. You can install them using pip:

pip install torch librosa transformers

💻 Usage Examples

Basic Usage

import torch
import librosa
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor

device = "cuda:0" if torch.cuda.is_available() else "cpu"

processor = Wav2Vec2Processor.from_pretrained(
    'qinyue/wav2vec2-large-xlsr-53-chinese-zn-cn-aishell1')
model = Wav2Vec2ForCTC.from_pretrained(
    'qinyue/wav2vec2-large-xlsr-53-chinese-zn-cn-aishell1').to(device)

filepath = 'test.wav'
audio, sr = librosa.load(filepath, sr=16000, mono=True)
inputs = processor(audio, sample_rate=16000, return_tensors="pt").to(device)
with torch.no_grad():
    logits = model(inputs.input_values,
                   attention_mask=inputs.attention_mask).logits
predicted_ids = torch.argmax(logits, dim=-1)
pred_str = processor.decode(predicted_ids[0])

print(pred_str)

Advanced Usage

wer_metric = load_metric("wer")

def compute_metrics(pred):
    pred_logits = pred.predictions
    pred_ids = np.argmax(pred_logits, axis=-1)

    pred.label_ids[pred.label_ids == -100] = processor.tokenizer.pad_token_id

    pred_str = processor.batch_decode(pred_ids, spaces_between_special_tokens=True)
    label_str = processor.batch_decode(pred.label_ids, group_tokens=False, spaces_between_special_tokens=True)

    wer = wer_metric.compute(predictions=pred_str, references=label_str)

    return {"wer": wer}

📚 Documentation

Results

Reference	Prediction
据伟业我爱我家市场研究院测算	据北业我爱我家市场研究院测算
七月北京公积金贷款成交量提升了百分之五	七月北京公积金贷款成交量提升了百分之五
培育门类丰富层次齐用的综合利用产业	培育门类丰富层资集业的综合利用产业
我们迎来了赶超发达国家的难得机遇	我们迎来了赶超发达国家的单得机遇
坚持基本草原保护制度	坚持基本草员保护制度
强化水生生态修复和建设	强化水生生态修复和建设
温州两男子为争女人驾奔驰宝马街头四次对撞	温州两男子为争女人架奔驰宝马接头四次对重
她表示应该是吃吃饭看电影之类的	他表示一的是吃吃饭看电影之理
加强畜禽遗传资源和农业野生植物资源保护	加强续紧遗传资源和农业野生职物资源保护
两人都是依赖电话沟通	两人都是依赖电话沟通

Test Result:

In the table below I report the Word Error Rate (WER) of the model on the AISHELL-1 test dataset.

Model	WER	WER-with-LM
qinyue/wav2vec2-large-xlsr-53-chinese-zn-cn-aishell1	7.04%	3.96%

📄 License

This model is licensed under the Apache-2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご