Wav2vec2-xlsr-Chuvash Open-source Model - Free Deployment to Boost Automatic Speech Recognition for Chuvash

Wav2vec2 Xlsr Chuvash

Developed by gagan3012

A fine-tuned model based on facebook/wav2vec2-large-xlsr-53 for Chuvash automatic speech recognition tasks.

Speech Recognition OtherOpen Source License:Apache-2.0 #Chuvash speech recognition #Low-resource language ASR #XLSR fine-tuning

Downloads 54

Release Time : 3/2/2022

Model Overview

This model is an automatic speech recognition (ASR) model fine-tuned on the Chuvash language using Facebook's wav2vec2-large-xlsr-53 model. It is trained with the Common Voice dataset and supports speech-to-text functionality for Chuvash.

Model Features

Chuvash language support

Speech recognition model specifically optimized for the Chuvash language

Based on XLSR-53 architecture

Utilizes large-scale cross-lingual speech representation learning pre-trained model

No language model required

Can be used directly without additional language model support

Model Capabilities

Chuvash speech recognition

Audio to text conversion

16kHz audio processing

Use Cases

Speech transcription

Chuvash speech transcription

Convert Chuvash speech content into text

Achieves a WER of 48.40% on the Common Voice test set

Voice assistant applications

Chuvash voice assistant

Used for developing Chuvash voice-controlled applications

🚀 Wav2Vec2-Large-XLSR-53-Chuvash

This model is fine-tuned from facebook/wav2vec2-large-xlsr-53 on Chuvash using the Common Voice, aiming to provide high - quality speech recognition for the Chuvash language.

Model Information

Property	Details
Model Type	Wav2Vec2-Large-XLSR-53-Chuvash
Training Data	Common Voice (Chuvash)
Metrics	Word Error Rate (WER)
License	Apache-2.0

Model Performance

Task	Dataset	Metric	Value
Automatic Speech Recognition	Common Voice cv	Test WER	48.40

⚠️ Important Note

When using this model, make sure that your speech input is sampled at 16kHz.

🚀 Quick Start

✨ Features

Fine - tuned on Chuvash language data from Common Voice.
Can be used for automatic speech recognition tasks without a language model.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

import torch
import torchaudio
from datasets import load_dataset
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor

test_dataset = load_dataset("common_voice", "cv", split="test")

processor = Wav2Vec2Processor.from_pretrained("gagan3012/wav2vec2-xlsr-chuvash") 
model = Wav2Vec2ForCTC.from_pretrained("gagan3012/wav2vec2-xlsr-chuvash") 

resampler = torchaudio.transforms.Resample(48_000, 16_000)

# Preprocessing the datasets.
# We need to read the aduio files as arrays
def speech_file_to_array_fn(batch):
    speech_array, sampling_rate = torchaudio.load(batch["path"])
    batch["speech"] = resampler(speech_array).squeeze().numpy()
    return batch

test_dataset = test_dataset.map(speech_file_to_array_fn)
inputs = processor(test_dataset["speech"][:2], sampling_rate=16_000, return_tensors="pt", padding=True)

with torch.no_grad():
    logits = model(inputs.input_values, attention_mask=inputs.attention_mask).logits

predicted_ids = torch.argmax(logits, dim=-1)

print("Prediction:", processor.batch_decode(predicted_ids))
print("Reference:", test_dataset["sentence"][:2])

Results:

Prediction: ['проектпа килӗшӳллӗн тӗлӗ мероприяти иртермелле', 'твăра çак планета минтӗ пуяни калленнана']

Reference: ['Проектпа килӗшӳллӗн, тӗрлӗ мероприяти ирттермелле.', 'Çак планета питĕ пуян иккен.']

Evaluation

import torch
import torchaudio
from datasets import load_dataset, load_metric
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
import re

!mkdir cer
!wget -O cer/cer.py https://huggingface.co/ctl/wav2vec2-large-xlsr-cantonese/raw/main/cer.py

test_dataset = load_dataset("common_voice", "cv", split="test") #TODO: replace {lang_id} in your language code here. Make sure the code is one of the *ISO codes* of [this](https://huggingface.co/languages) site.
wer = load_metric("wer")
cer = load_metric("cer")

processor = Wav2Vec2Processor.from_pretrained("gagan3012/wav2vec2-xlsr-chuvash") 
model = Wav2Vec2ForCTC.from_pretrained("gagan3012/wav2vec2-xlsr-chuvash") 
model.to("cuda")

chars_to_ignore_regex = '[\\\\,\\\\?\\\\.\\\\!\\\\-\\\\;\\\\:\\\\"\\\\“]'  # TODO: adapt this list to include all special characters you removed from the data
resampler = torchaudio.transforms.Resample(48_000, 16_000)

# Preprocessing the datasets.
# We need to read the aduio files as arrays
def speech_file_to_array_fn(batch):
    batch["sentence"] = re.sub(chars_to_ignore_regex, '', batch["sentence"]).lower()
    speech_array, sampling_rate = torchaudio.load(batch["path"])
    batch["speech"] = resampler(speech_array).squeeze().numpy()
    return batch

test_dataset = test_dataset.map(speech_file_to_array_fn)

# Preprocessing the datasets.
# We need to read the aduio files as arrays
def evaluate(batch):
    inputs = processor(batch["speech"], sampling_rate=16_000, return_tensors="pt", padding=True)

    with torch.no_grad():
        logits = model(inputs.input_values.to("cuda"), attention_mask=inputs.attention_mask.to("cuda")).logits

    pred_ids = torch.argmax(logits, dim=-1)
    batch["pred_strings"] = processor.batch_decode(pred_ids)
    return batch

result = test_dataset.map(evaluate, batched=True, batch_size=8)

print("WER: {:2f}".format(100 * wer.compute(predictions=result["pred_strings"], references=result["sentence"])))
print("CER: {:2f}".format(100 * cer.compute(predictions=result["pred_strings"], references=result["sentence"])))

Test Result: 48.40 %

🔧 Technical Details

The script used for training can be found here

📄 License

This model is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご