Whisper-Kannada-Tiny Open-Source Speech Recognition Model - Free Automatic Speech Recognition for Kannada

Whisper Kannada Tiny

Developed by vasista22

A Kannada automatic speech recognition model fine-tuned based on openai/whisper-tiny, trained on multiple public Kannada ASR corpora

Speech Recognition OtherOpen Source License:Apache-2.0 #Kannada Speech Recognition #Low Word Error Rate #Multi-dataset Fine-tuning

Downloads 119

Release Time : 12/19/2022

Model Overview

An automatic speech recognition model optimized for Kannada, suitable for speech-to-text tasks

Model Features

Kannada Optimization

Specially fine-tuned for Kannada speech characteristics

Multi-dataset Training

Trained using multiple public Kannada ASR corpora

Efficient Inference

Supports whisper-jax for fast batch inference

Model Capabilities

Kannada Speech Recognition

Long Audio Processing (supports chunking)

Real-time Transcription

Use Cases

Speech Transcription

Meeting Minutes

Convert Kannada meeting recordings into text records

Word Error Rate 13.38% (on Fleurs test set)

Media Caption Generation

Generate subtitles for Kannada video content

🚀 Whisper Kannada Tiny

This model is a fine - tuned version of [openai/whisper - tiny](https://huggingface.co/openai/whisper - tiny) on Kannada data from multiple publicly available ASR corpuses. It was fine - tuned as part of the Whisper fine - tuning sprint, offering a powerful solution for Kannada automatic speech recognition.

🚀 Quick Start

This model is a fine - tuned version of [openai/whisper - tiny](https://huggingface.co/openai/whisper - tiny) on the Kannada data available from multiple publicly available ASR corpuses. It has been fine - tuned as a part of the Whisper fine - tuning sprint.

NOTE: The code used to train this model is available for re - use in the [whisper - finetune](https://github.com/vasistalodagala/whisper - finetune) repository.

✨ Features

Fine - tuned on Kannada data from multiple ASR corpuses.
Code for training is available for re - use.
Can be evaluated on an entire dataset using provided evaluation codes.
Supports faster inference with whisper - jax.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

In order to infer a single audio file using this model, the following code snippet can be used:

>>> import torch
>>> from transformers import pipeline

>>> # path to the audio file to be transcribed
>>> audio = "/path/to/audio.format"
>>> device = "cuda:0" if torch.cuda.is_available() else "cpu"

>>> transcribe = pipeline(task="automatic-speech-recognition", model="vasista22/whisper-kannada-tiny", chunk_length_s=30, device=device)
>>> transcribe.model.config.forced_decoder_ids = transcribe.tokenizer.get_decoder_prompt_ids(language="kn", task="transcribe")

>>> print('Transcription: ', transcribe(audio)["text"])

Advanced Usage

For faster inference of whisper models, the [whisper - jax](https://github.com/sanchit - gandhi/whisper - jax) library can be used. Please follow the necessary installation steps as mentioned [here](https://github.com/vasistalodagala/whisper - finetune#faster - evaluation - with - whisper - jax), before using the following code snippet:

>>> import jax.numpy as jnp
>>> from whisper_jax import FlaxWhisperForConditionalGeneration, FlaxWhisperPipline

>>> # path to the audio file to be transcribed
>>> audio = "/path/to/audio.format"

>>> transcribe = FlaxWhisperPipline("vasista22/whisper-kannada-tiny", batch_size=16)
>>> transcribe.model.config.forced_decoder_ids = transcribe.tokenizer.get_decoder_prompt_ids(language="kn", task="transcribe")

>>> print('Transcription: ', transcribe(audio)["text"])

📚 Documentation

Training and evaluation data

Property	Details
Training Data	IISc - MILE Kannada ASR Corpus, [ULCA ASR Corpus](https://github.com/Open - Speech - EkStep/ULCA - asr - dataset - corpus#kannada - labelled - total - duration - is - 60891 - hours), Shrutilipi ASR Corpus, Google/Fleurs Train+Dev set
Evaluation Data	Google/Fleurs Test Set, IISc - MILE Test Set, OpenSLR

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e - 05
train_batch_size: 88
eval_batch_size: 88
seed: 22
optimizer: adamw_bnb_8bit
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 10000
training_steps: 15008 (terminated upon convergence. Initially set to 51570 steps)
mixed_precision_training: True

🔧 Technical Details

The model is a fine - tuned version of [openai/whisper - tiny](https://huggingface.co/openai/whisper - tiny) on Kannada data. It was trained with specific hyperparameters and evaluated on multiple datasets as described above.

📄 License

This model is released under the Apache 2.0 license.

Acknowledgement

This work was done at Speech Lab, IIT Madras. The compute resources for this work were funded by "Bhashini: National Language translation Mission" project of the Ministry of Electronics and Information Technology (MeitY), Government of India.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご