Whisper - Tamil - Large - V2: An open - source Tamil speech recognition model to empower accurate Tamil speech recognition for free.

Whisper Tamil Large V2

Developed by vasista22

Tamil speech recognition model fine-tuned based on OpenAI Whisper-large-v2, trained on multiple public Tamil ASR corpora

Speech Recognition OtherOpen Source License:Apache-2.0 #Tamil speech recognition #Multi-dialect adaptation #Low word error rate

Downloads 325

Release Time : 1/1/2023

Model Overview

An automatic speech recognition model optimized for Tamil, suitable for transcription tasks across various accents and dialects

Model Features

Multi-dataset fine-tuning

Trained on 6 different sources of Tamil ASR datasets, covering a wide range of speech characteristics

Low word error rate

Achieves WER of only 6.61% on Common Voice 11.0 test set and 7.5% WER on Fleurs test set

Efficient inference support

Provides two inference solutions: standard transformers and whisper-jax, supporting batch processing and GPU acceleration

Model Capabilities

Tamil speech transcription

Long audio processing (supports chunking)

Accent adaptation

Use Cases

Speech transcription services

Tamil media content subtitle generation

Automatically generates subtitles for video/podcast media content

Achieves 93.39% accuracy on Common Voice test set

Voice assistant development

Tamil voice command recognition

Used to develop smart voice assistants supporting Tamil

🚀 Whisper Tamil Large-v2

This model is a fine - tuned version of openai/whisper-large-v2 on Tamil data from multiple publicly available ASR corpuses. It was fine - tuned as part of the Whisper fine - tuning sprint, offering enhanced performance for Tamil automatic speech recognition.

NOTE: The code for training this model can be reused from the whisper-finetune repository.

🚀 Quick Start

✨ Features

Fine - tuned on multiple Tamil ASR corpuses.
Code for training and evaluation is publicly available for re - use.
Supports faster inference with whisper - jax.

📦 Installation

The installation steps are not explicitly provided in the original README. However, you can refer to the whisper-finetune repository for relevant installation codes and scripts.

💻 Usage Examples

Basic Usage

>>> import torch
>>> from transformers import pipeline

>>> # path to the audio file to be transcribed
>>> audio = "/path/to/audio.format"
>>> device = "cuda:0" if torch.cuda.is_available() else "cpu"

>>> transcribe = pipeline(task="automatic-speech-recognition", model="vasista22/whisper-tamil-large-v2", chunk_length_s=30, device=device)
>>> transcribe.model.config.forced_decoder_ids = transcribe.tokenizer.get_decoder_prompt_ids(language="ta", task="transcribe")

>>> print('Transcription: ', transcribe(audio)["text"])

Advanced Usage

>>> import jax.numpy as jnp
>>> from whisper_jax import FlaxWhisperForConditionalGeneration, FlaxWhisperPipline

>>> # path to the audio file to be transcribed
>>> audio = "/path/to/audio.format"

>>> transcribe = FlaxWhisperPipline("vasista22/whisper-tamil-large-v2", batch_size=16)
>>> transcribe.model.config.forced_decoder_ids = transcribe.tokenizer.get_decoder_prompt_ids(language="ta", task="transcribe")

>>> print('Transcription: ', transcribe(audio)["text"])

📚 Documentation

Training and evaluation data

Property	Details
Training Data	IISc - MILE Tamil ASR Corpus, [ULCA ASR Corpus](https://github.com/Open - Speech - EkStep/ULCA - asr - dataset - corpus#tamil - labelled--total - duration - is - 116024 - hours), Shrutilipi ASR Corpus, [Microsoft Speech Corpus (Indian Languages)](https://msropendata.com/datasets/7230b4b1 - 912d - 400e - be58 - f84e0512985e), Google/Fleurs Train+Dev set, Babel ASR Corpus
Evaluation Data	[Microsoft Speech Corpus (Indian Languages) Test Set](https://msropendata.com/datasets/7230b4b1 - 912d - 400e - be58 - f84e0512985e), Google/Fleurs Test Set, IISc - MILE Test Set, Babel Test Set

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.75e - 05
train_batch_size: 8
eval_batch_size: 24
seed: 22
optimizer: adamw_bnb_8bit
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 22000
training_steps: 52500 (Initially set to 76000 steps)
mixed_precision_training: True

🔧 Technical Details

This model is a fine - tuned version of openai/whisper-large-v2 on Tamil data. The fine - tuning was carried out as part of the Whisper fine - tuning sprint. The training and evaluation codes are available in the whisper-finetune repository.

📄 License

This model is licensed under the Apache - 2.0 license.

Model Performance

Task	Dataset	WER
Automatic Speech Recognition	google/fleurs (ta_in test split)	7.5
Automatic Speech Recognition	mozilla - foundation/common_voice_11_0 (ta test split)	6.61

Acknowledgement

This work was done at Speech Lab, IIT Madras. The compute resources were funded by the "Bhashini: National Language translation Mission" project of the Ministry of Electronics and Information Technology (MeitY), Government of India.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご