Whisper-Tamil-Small Open-Source Tamil Speech Recognition Model - Free Deployment for Accurate Speech-to-Text Conversion

Whisper Tamil Small

Developed by vasista22

A Tamil automatic speech recognition model fine-tuned based on OpenAI Whisper-small, trained on multiple public datasets with excellent word error rate performance.

Speech Recognition OtherOpen Source License:Apache-2.0 #Tamil speech recognition #Low word error rate #Multi-corpus fine-tuning

Downloads 10.78k

Release Time : 1/1/2023

Model Overview

This model is an automatic speech recognition (ASR) model optimized specifically for Tamil, fine-tuned on the Whisper-small architecture, suitable for Tamil speech-to-text tasks.

Model Features

Low word error rate

WER of only 7.95 on the Common Voice 11.0 Tamil test set and 9.11 on the Fleurs test set.

Multi-dataset training

Incorporates training data from 6 mainstream Tamil ASR datasets.

Accelerated inference support

Provides JAX-accelerated inference solutions based on whisper-jax, supporting batch processing.

Model Capabilities

Tamil speech recognition

Long audio processing (supports chunking)

Real-time transcription

Use Cases

Speech transcription

Meeting minutes

Convert Tamil meeting recordings into text transcripts.

Highly accurate transcriptions.

Media subtitle generation

Automatically generate subtitles for Tamil video content.

Accurate subtitles with WER below 10%.

Voice assistants

Tamil voice command recognition

Used for localized voice assistant development.

🚀 Whisper Tamil Small

This model is a fine - tuned version of openai/whisper-small on Tamil data from multiple publicly available ASR corpuses. It was fine - tuned as part of the Whisper fine - tuning sprint, aiming to enhance the performance of automatic speech recognition in the Tamil language.

NOTE: The code for training this model can be reused from the whisper-finetune repository.

🚀 Quick Start

To evaluate this model on an entire dataset, use the evaluation codes in the whisper-finetune repository. The same repository also offers scripts for faster inference with whisper - jax.

✨ Features

Fine - tuned on multiple Tamil ASR corpuses for better performance in Tamil speech recognition.
Reusable training code available in a public repository.
Support for faster inference using whisper - jax.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

To infer a single audio file using this model, use the following code snippet:

>>> import torch
>>> from transformers import pipeline

>>> # path to the audio file to be transcribed
>>> audio = "/path/to/audio.format"
>>> device = "cuda:0" if torch.cuda.is_available() else "cpu"

>>> transcribe = pipeline(task="automatic-speech-recognition", model="vasista22/whisper-tamil-small", chunk_length_s=30, device=device)
>>> transcribe.model.config.forced_decoder_ids = transcribe.tokenizer.get_decoder_prompt_ids(language="ta", task="transcribe")

>>> print('Transcription: ', transcribe(audio)["text"])

Advanced Usage

For faster inference of whisper models, use the whisper-jax library. Follow the necessary installation steps as mentioned here before using the following code:

>>> import jax.numpy as jnp
>>> from whisper_jax import FlaxWhisperForConditionalGeneration, FlaxWhisperPipline

>>> # path to the audio file to be transcribed
>>> audio = "/path/to/audio.format"

>>> transcribe = FlaxWhisperPipline("vasista22/whisper-tamil-small", batch_size=16)
>>> transcribe.model.config.forced_decoder_ids = transcribe.tokenizer.get_decoder_prompt_ids(language="ta", task="transcribe")

>>> print('Transcription: ', transcribe(audio)["text"])

📚 Documentation

Training and evaluation data

Training Data

IISc-MILE Tamil ASR Corpus
ULCA ASR Corpus
Shrutilipi ASR Corpus
Microsoft Speech Corpus (Indian Languages)
Google/Fleurs Train+Dev set
Babel ASR Corpus

Evaluation Data

Microsoft Speech Corpus (Indian Languages) Test Set
Google/Fleurs Test Set
IISc-MILE Test Set
Babel Test Set

Training hyperparameters

The following hyperparameters were used during training:

Property	Details
learning_rate	1.7e - 05
train_batch_size	48
eval_batch_size	32
seed	22
optimizer	adamw_bnb_8bit
lr_scheduler_type	linear
lr_scheduler_warmup_steps	17500
training_steps	29659 (Initially set to 84740 steps)
mixed_precision_training	True

🔧 Technical Details

The model is a fine - tuned version of openai/whisper-small on Tamil data. The fine - tuning was part of the Whisper fine - tuning sprint. The code for training and evaluation is available in the whisper-finetune repository.

📄 License

This model is licensed under the Apache 2.0 license.

Acknowledgement

This work was done at Speech Lab, IIT Madras. The compute resources were funded by the "Bhashini: National Language translation Mission" project of the Ministry of Electronics and Information Technology (MeitY), Government of India.

Model Index

Task	Dataset	WER
Automatic Speech Recognition	google/fleurs (ta_in test split)	9.11
Automatic Speech Recognition	mozilla - foundation/common_voice_11_0 (ta test split)	7.95

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご