Whisper-Hindi-Large-V2 Open-Source Model - Free Support for Precise Hindi Speech Recognition

Whisper Hindi Large V2

Developed by vasista22

Hindi speech recognition model fine-tuned based on OpenAI Whisper-large-v2, trained on multiple public Hindi ASR corpora

Speech Recognition OtherOpen Source License:Apache-2.0 #Hindi speech recognition #Low word error rate #Multi-corpus fine-tuning

Downloads 1,488

Release Time : 1/14/2023

Model Overview

This model is an automatic speech recognition (ASR) model optimized for Hindi, capable of accurately transcribing Hindi speech into text.

Model Features

High-precision Hindi recognition

Achieves a word error rate (WER) of 6.8% on the Fleurs test set

Multi-dataset training

Trained using multiple Hindi ASR datasets including GramVaani, ULCA, and Shrutilipi

Fast inference support

Supports accelerated inference using whisper-jax

Model Capabilities

Hindi speech recognition

Long audio processing (supports chunking)

Speech-to-text

Use Cases

Speech transcription

Hindi speech-to-text

Convert Hindi speech content into text transcripts

WER 6.8% on Fleurs test set, WER 10.98% on Common Voice 11.0 test set

Voice assistants

Hindi voice command recognition

Used for voice command recognition in Hindi voice assistant systems

🚀 Whisper Hindi Large-v2

This model is a fine - tuned version of openai/whisper-large-v2 on Hindi data from multiple publicly available ASR corpuses, contributing to speech recognition in Hindi.

🚀 Quick Start

This model is a fine - tuned version of openai/whisper-large-v2 on the Hindi data available from multiple publicly available ASR corpuses. It has been fine - tuned as a part of the Whisper fine - tuning sprint.

NOTE: The code used to train this model is available for re - use in the whisper-finetune repository.

✨ Features

Fine - tuned on multiple publicly available Hindi ASR corpuses.
The training code is open - source and available for re - use.
Supports evaluation on entire datasets and single audio file inference.
Allows for faster inference using whisper - jax.

📦 Installation

The installation steps are not explicitly provided in the original README. However, relevant evaluation and inference codes rely on repositories like whisper-finetune and the transformers, torch, whisper - jax libraries. You may need to install these dependencies according to their official documentation.

💻 Usage Examples

Basic Usage

In order to infer a single audio file using this model, the following code snippet can be used:

>>> import torch
>>> from transformers import pipeline

>>> # path to the audio file to be transcribed
>>> audio = "/path/to/audio.format"
>>> device = "cuda:0" if torch.cuda.is_available() else "cpu"

>>> transcribe = pipeline(task="automatic-speech-recognition", model="vasista22/whisper-hindi-large-v2", chunk_length_s=30, device=device)
>>> transcribe.model.config.forced_decoder_ids = transcribe.tokenizer.get_decoder_prompt_ids(language="hi", task="transcribe")

>>> print('Transcription: ', transcribe(audio)["text"])

Advanced Usage

For faster inference of whisper models, the whisper-jax library can be used. Please follow the necessary installation steps as mentioned here, before using the following code snippet:

>>> import jax.numpy as jnp
>>> from whisper_jax import FlaxWhisperForConditionalGeneration, FlaxWhisperPipline

>>> # path to the audio file to be transcribed
>>> audio = "/path/to/audio.format"

>>> transcribe = FlaxWhisperPipline("vasista22/whisper-hindi-large-v2", batch_size=16)
>>> transcribe.model.config.forced_decoder_ids = transcribe.tokenizer.get_decoder_prompt_ids(language="hi", task="transcribe")

>>> print('Transcription: ', transcribe(audio)["text"])

📚 Documentation

In order to evaluate this model on an entire dataset, the evaluation codes available in the whisper-finetune repository can be used. The same repository also provides the scripts for faster inference using whisper - jax.

🔧 Technical Details

Training and evaluation data

Property	Details
Training Data	GramVaani ASR Corpus, ULCA ASR Corpus, Shrutilipi ASR Corpus, Google/Fleurs Train+Dev set
Evaluation Data	GramVaani ASR Corpus Test Set, Google/Fleurs Test Set

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.75e - 05
train_batch_size: 8
eval_batch_size: 24
seed: 22
optimizer: adamw_bnb_8bit
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 25000
training_steps: 57000 (Initially set to 116255 steps)
mixed_precision_training: True

📄 License

This model is released under the Apache - 2.0 license.

Acknowledgement

This work was done at Speech Lab, IIT Madras. The compute resources for this work were funded by "Bhashini: National Language translation Mission" project of the Ministry of Electronics and Information Technology (MeitY), Government of India.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご