Wav2vec2 Open-Source Russian Speech Recognition Model - Accurately Recognize Speech Content for Free

Wav2vec2 Large 100k Voxpopuli Ft Common Voice Plus TTS Dataset Russian

Developed by Edresson

This is a speech recognition model based on Facebook's wav2vec2-large-100k-voxpopuli, fine-tuned using Common Voice 7.0 and M-AILABS Russian data.

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Russian Speech Recognition #High Accuracy WER24.8 #Multi-source Data Fine-tuning

Downloads 25

Release Time : 3/2/2022

Model Overview

This model is primarily used for Russian speech recognition tasks, capable of converting Russian speech into text.

Model Features

High-Accuracy Russian Speech Recognition

Achieves a 24.80% Word Error Rate (WER) on the Common Voice 7.0 Russian test set.

Multi-source Data Training

Combines high-quality Russian speech datasets from Common Voice and M-AILABS for fine-tuning.

Transformer-based Architecture

Utilizes the advanced wav2vec2 architecture with powerful speech feature extraction capabilities.

Model Capabilities

Russian Speech Recognition

Speech-to-Text

Audio Processing

Use Cases

Speech Transcription

Russian Speech Transcription

Convert Russian speech content into text format

Word Error Rate 24.80%

Voice Assistants

Russian Voice Command Recognition

Used for voice command recognition in Russian voice assistants or smart home devices

🚀 Wav2vec2 Large 100k Voxpopuli fine-tuned with Common Voice and M-AILABS in Russian

This project fine-tunes Wav2vec2 Large 100k Voxpopuli in Russian using the Common Voice 7.0 and M-AILABS, aiming to improve the performance of automatic speech recognition in the Russian language.

🚀 Quick Start

Prerequisites

The model is based on the transformers library, so make sure you have it installed.
You also need torchaudio for audio processing.

Installation

You can install the required libraries using pip:

pip install transformers torchaudio

✨ Features

Fine-tuned for Russian: This model is fine-tuned on the Russian language using the Common Voice 7.0 and M-AILABS datasets, which can better adapt to Russian speech characteristics.
High-performance Speech Recognition: It can achieve a relatively low Word Error Rate (WER) in Russian speech recognition tasks.

📦 Installation

As mentioned above, you can install the necessary libraries via pip:

pip install transformers torchaudio

💻 Usage Examples

Basic Usage

from transformers import AutoTokenizer, Wav2Vec2ForCTC
  
tokenizer = AutoTokenizer.from_pretrained("Edresson/wav2vec2-large-100k-voxpopuli-ft-Common-Voice_plus_TTS-Dataset-russian")
model = Wav2Vec2ForCTC.from_pretrained("Edresson/wav2vec2-large-100k-voxpopuli-ft-Common-Voice_plus_TTS-Dataset-russian")

Advanced Usage

Example test with Common Voice Dataset

dataset = load_dataset("common_voice", "pt", split="test", data_dir="./cv-corpus-6.1-2020-12-11")

resampler = torchaudio.transforms.Resample(orig_freq=48_000, new_freq=16_000)

def map_to_array(batch):
    speech, _ = torchaudio.load(batch["path"])
    batch["speech"] = resampler.forward(speech.squeeze(0)).numpy()
    batch["sampling_rate"] = resampler.new_freq
    batch["sentence"] = re.sub(chars_to_ignore_regex, '', batch["sentence"]).lower().replace("â€™", "'")
    return batch

ds = dataset.map(map_to_array)
result = ds.map(map_to_pred, batched=True, batch_size=1, remove_columns=list(ds.features.keys()))
print(wer.compute(predictions=result["predicted"], references=result["target"]))

📚 Documentation

Model Information

Property	Details
Model Type	Wav2vec2 Large 100k Voxpopuli fine-tuned with Common Voice and M-AILABS in Russian
Training Data	Common Voice 7.0 and M-AILABS

Results

For the results check the paper

📄 License

This model is licensed under the apache-2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご