🚀 ruT5-ASR
The ruT5-ASR model is designed to correct errors in the ASR output, enhancing the accuracy of speech recognition results.
🚀 Quick Start
The ruT5-ASR model, trained by bond005, is based on ruT5-base. It aims to correct errors in the ASR output, especially from Wav2Vec2-Large-Ru-Golos.
✨ Features
- Error Correction: Specifically designed to rectify errors in ASR output.
- Stand - alone Usage: Can be used as a standalone sequence - to - sequence model.
📦 Installation
The installation mainly involves installing the necessary Python libraries. You can use pip
to install the transformers
and torch
libraries:
pip install transformers torch
💻 Usage Examples
Basic Usage
from transformers import T5ForConditionalGeneration, T5Tokenizer
import torch
def rescore(text: str, tokenizer: T5Tokenizer,
model: T5ForConditionalGeneration) -> str:
if len(text) == 0:
return ''
ru_letters = set('аоуыэяеёюибвгдйжзклмнпрстфхцчшщьъ')
punct = set('.,:/\\?!()[]{};"\'-')
x = tokenizer(text, return_tensors='pt', padding=True).to(model.device)
max_size = int(x.input_ids.shape[1] * 1.5 + 10)
min_size = 3
if x.input_ids.shape[1] <= min_size:
return text
out = model.generate(**x, do_sample=False, num_beams=5,
max_length=max_size, min_length=min_size)
res = tokenizer.decode(out[0], skip_special_tokens=True).lower().strip()
res = ' '.join(res.split())
postprocessed = ''
for cur in res:
if cur.isspace() or (cur in punct):
postprocessed += ' '
elif cur in ru_letters:
postprocessed += cur
return (' '.join(postprocessed.strip().split())).replace('ё', 'е')
tokenizer_for_rescoring = T5Tokenizer.from_pretrained('bond005/ruT5-ASR')
model_for_rescoring = T5ForConditionalGeneration.from_pretrained('bond005/ruT5-ASR')
if torch.cuda.is_available():
model_for_rescoring = model_for_rescoring.cuda()
input_examples = [
'уласны в москве интерне только в большом году что лепровели',
'мороз и солнце день чудесный',
'нейро сети эта харошо',
'да'
]
for src in input_examples:
rescored = rescore(src, tokenizer_for_rescoring, model_for_rescoring)
print(f'{src} -> {rescored}')
The output of the above code is as follows:
уласны в москве интерне только в большом году что лепровели -> у нас в москве интернет только в прошлом году что ли провели
мороз и солнце день чудесный -> мороз и солнце день чудесный
нейро сети эта харошо -> нейросети это хорошо
да -> да
📚 Documentation
Evaluation
This model was evaluated on the test subsets of SberDevices Golos, Common Voice 6.0 (Russian part), and Russian Librispeech. However, it was trained only on the training subset of SberDevices Golos. You can find the evaluation script for other datasets, including Russian Librispeech and SOVA RuDevices, on the Kaggle web - page https://www.kaggle.com/code/bond005/wav2vec2-t5-ru-eval.
*Comparison with "pure" Wav2Vec2-Large-Ru-Golos (WER, %)**:
Dataset Name |
Pure ASR |
ASR with Rescoring |
Voxforge Ru |
27.08 |
40.48 |
Russian LibriSpeech |
21.87 |
23.77 |
Sova RuDevices |
25.41 |
20.13 |
Golos Crowd |
10.14 |
9.42 |
Golos Farfield |
20.35 |
17.99 |
CommonVoice Ru |
18.55 |
11.60 |
📄 License
This project is licensed under the Apache 2.0 license.