ruT5-ASR Open-Source Model - Free Correction of Output Errors in Russian Automatic Speech Recognition

Rut5 ASR

Developed by bond005

Model trained based on ruT5-base architecture for correcting errors in Russian automatic speech recognition (ASR) output

Text Generation

Transformers

OtherOpen Source License:Apache-2.0 #Russian ASR error correction #Based on ruT5 #Speech recognition post-processing

Downloads 62

Release Time : 4/6/2023

Model Overview

This model is specifically designed to correct output errors from Russian automatic speech recognition systems, particularly targeting the output results of the Wav2Vec2-Large-Ru-Golos model.

Model Features

Russian ASR error correction

Specifically optimizes and corrects errors in the output of Russian automatic speech recognition systems

Based on ruT5 architecture

Utilizes the powerful ruT5-base architecture for sequence-to-sequence text generation

Multi-dataset evaluation

Comprehensively evaluated on multiple Russian datasets, including Golos, Common Voice, etc.

Model Capabilities

Russian text correction

ASR output optimization

Sequence-to-sequence text generation

Use Cases

Speech recognition post-processing

ASR output correction

Corrects errors in Russian text output from automatic speech recognition systems

Significantly reduces word error rate (WER) across multiple datasets

Speech transcription optimization

Improves accuracy and readability of Russian speech transcription texts

Reduced WER from 18.55% to 11.60% on the Common Voice dataset

🚀 ruT5-ASR

The ruT5-ASR model is designed to correct errors in the ASR output, enhancing the accuracy of speech recognition results.

🚀 Quick Start

The ruT5-ASR model, trained by bond005, is based on ruT5-base. It aims to correct errors in the ASR output, especially from Wav2Vec2-Large-Ru-Golos.

✨ Features

Error Correction: Specifically designed to rectify errors in ASR output.
Stand - alone Usage: Can be used as a standalone sequence - to - sequence model.

📦 Installation

The installation mainly involves installing the necessary Python libraries. You can use pip to install the transformers and torch libraries:

pip install transformers torch

💻 Usage Examples

Basic Usage

from transformers import T5ForConditionalGeneration, T5Tokenizer
import torch


def rescore(text: str, tokenizer: T5Tokenizer,
            model: T5ForConditionalGeneration) -> str:
    if len(text) == 0:  # if an input text is empty, then we return an empty text too
        return ''
    ru_letters = set('аоуыэяеёюибвгдйжзклмнпрстфхцчшщьъ')
    punct = set('.,:/\\?!()[]{};"\'-')
    x = tokenizer(text, return_tensors='pt', padding=True).to(model.device)
    max_size = int(x.input_ids.shape[1] * 1.5 + 10)
    min_size = 3
    if x.input_ids.shape[1] <= min_size:
        return text  # we don't rescore a very short text
    out = model.generate(**x, do_sample=False, num_beams=5,
                         max_length=max_size, min_length=min_size)
    res = tokenizer.decode(out[0], skip_special_tokens=True).lower().strip()
    res = ' '.join(res.split())
    postprocessed = ''
    for cur in res:
        if cur.isspace() or (cur in punct):
            postprocessed += ' '
        elif cur in ru_letters:
            postprocessed += cur
    return (' '.join(postprocessed.strip().split())).replace('ё', 'е')


# load model and tokenizer
tokenizer_for_rescoring = T5Tokenizer.from_pretrained('bond005/ruT5-ASR')
model_for_rescoring = T5ForConditionalGeneration.from_pretrained('bond005/ruT5-ASR')
if torch.cuda.is_available():
    model_for_rescoring = model_for_rescoring.cuda()

input_examples = [
    'уласны в москве интерне только в большом году что лепровели',
    'мороз и солнце день чудесный',
    'нейро сети эта харошо',
    'да'
]

for src in input_examples:
    rescored = rescore(src, tokenizer_for_rescoring, model_for_rescoring)
    print(f'{src} -> {rescored}')

The output of the above code is as follows:

уласны в москве интерне только в большом году что лепровели -> у нас в москве интернет только в прошлом году что ли провели
мороз и солнце день чудесный -> мороз и солнце день чудесный
нейро сети эта харошо -> нейросети это хорошо
да -> да

📚 Documentation

Evaluation

This model was evaluated on the test subsets of SberDevices Golos, Common Voice 6.0 (Russian part), and Russian Librispeech. However, it was trained only on the training subset of SberDevices Golos. You can find the evaluation script for other datasets, including Russian Librispeech and SOVA RuDevices, on the Kaggle web - page https://www.kaggle.com/code/bond005/wav2vec2-t5-ru-eval.

*Comparison with "pure" Wav2Vec2-Large-Ru-Golos (WER, %)**:

Dataset Name	Pure ASR	ASR with Rescoring
Voxforge Ru	27.08	40.48
Russian LibriSpeech	21.87	23.77
Sova RuDevices	25.41	20.13
Golos Crowd	10.14	9.42
Golos Farfield	20.35	17.99
CommonVoice Ru	18.55	11.60

📄 License

This project is licensed under the Apache 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご