Wav2vec2-xls-r-Hindi Open-source Speech Recognition Model - Accurately Identify Hindi Speech Content

Wav2vec2 Xls R Hindi

Developed by shivam

This is an automatic speech recognition (ASR) model fine-tuned on the Hindi Common Voice 7.0 dataset based on facebook/wav2vec2-xls-r-300m

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Hindi speech recognition #Multi-dialect support #Low-resource optimization

Downloads 19

Release Time : 3/2/2022

Model Overview

This model is specifically designed for Hindi speech recognition tasks, achieving a word error rate (WER) of 52.3% and a character error rate (CER) of 26.09% on the Common Voice 7.0 Hindi test set

Model Features

Hindi speech recognition

Speech recognition model specifically optimized for Hindi

Based on XLS-R architecture

Uses facebook's wav2vec2-xls-r-300m as the base model

Trained on Common Voice dataset

Fine-tuned on Mozilla Common Voice 7.0 Hindi dataset

Model Capabilities

Hindi speech-to-text

Automatic speech recognition

Use Cases

Speech transcription

Hindi speech transcription

Convert Hindi speech content to text

Achieved 52.3% word error rate on test set

Voice assistants

Hindi voice command recognition

Used for understanding Hindi voice commands

🚀 shivam/wav2vec2-xls-r-hindi

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA - FOUNDATION/COMMON_VOICE_7_0 - HI dataset. It is designed for automatic speech recognition tasks, aiming to accurately transcribe Hindi speech.

🚀 Quick Start

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA - FOUNDATION/COMMON_VOICE_7_0 - HI dataset. It achieves the following results on the evaluation set:

Loss: 1.2282
Wer: 0.6838

💻 Usage Examples

Basic Usage

The basic usage of this model involves loading it and using it for speech recognition tasks. Here is a simple example code snippet (assuming you have the necessary libraries installed):

# Load the model
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
import torch
from datasets import load_dataset
import soundfile as sf

model = Wav2Vec2ForCTC.from_pretrained("shivam/wav2vec2-xls-r-hindi")
processor = Wav2Vec2Processor.from_pretrained("shivam/wav2vec2-xls-r-hindi")

# Load an audio file
def map_to_array(batch):
    speech, _ = sf.read(batch["path"])
    batch["speech"] = speech
    return batch

dataset = load_dataset("mozilla-foundation/common_voice_7_0", "hi", split="test[:10]")
dataset = dataset.map(map_to_array)

# Prepare the input
input_values = processor(dataset["speech"][0], return_tensors="pt").input_values

# Inference
with torch.no_grad():
    logits = model(input_values).logits

predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.decode(predicted_ids[0])
print(transcription)

📚 Documentation

Evaluation results on Common Voice 7 "test" (Running ./eval.py)

With LM

WER: 52.30
CER: 26.09

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 7.5e - 05
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 2000
num_epochs: 50.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
5.3155	3.4	500	4.5582	1.0
3.3369	6.8	1000	3.4269	1.0
2.1785	10.2	1500	1.7191	0.8831
1.579	13.6	2000	1.3604	0.7647
1.3773	17.01	2500	1.2737	0.7519
1.3165	20.41	3000	1.2457	0.7401
1.2274	23.81	3500	1.3617	0.7301
1.1787	27.21	4000	1.2068	0.7010
1.1467	30.61	4500	1.2416	0.6946
1.0801	34.01	5000	1.2312	0.6990
1.0709	37.41	5500	1.2984	0.7138
1.0307	40.81	6000	1.2049	0.6871
1.0003	44.22	6500	1.1956	0.6841
1.004	47.62	7000	1.2101	0.6793

Framework versions

Transformers 4.16.0.dev0
Pytorch 1.10.1+cu113
Datasets 1.18.1.dev0
Tokenizers 0.11.0

📄 License

This model is licensed under the Apache - 2.0 license.

📦 Model Information

Property	Details
Model Type	Fine - tuned version of facebook/wav2vec2-xls-r-300m for Hindi automatic speech recognition
Training Data	mozilla - foundation/common_voice_7_0 (Hindi subset)
Metrics	Wer, Cer
Model Name	shivam/wav2vec2-xls-r-hindi

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご