Whisper - Telugu - Base Open - source Model: Free Implementation of Automatic Speech Recognition for Telugu Language

Whisper Telugu Base

Developed by vasista22

A Telugu automatic speech recognition (ASR) model fine-tuned based on OpenAI Whisper-base, trained on multiple public Telugu datasets

Speech Recognition OtherOpen Source License:Apache-2.0 #Telugu Speech Recognition #Low Word Error Rate #Multi-dialect Support

Downloads 279

Release Time : 12/20/2022

Model Overview

A specialized automatic speech recognition model for Telugu speech-to-text tasks, part of the Whisper fine-tuning sprint

Model Features

Multi-dataset Training

Trained on 6 different Telugu ASR corpora, including IIIT-H, ULCA, Shrutilipi, and other datasets

Efficient Fine-tuning

Targeted fine-tuning based on OpenAI whisper-base model, adapted to Telugu language characteristics

Fast Inference Support

Supports accelerated inference using whisper-jax to improve processing efficiency

Model Capabilities

Telugu speech recognition

Long audio processing (supports chunking)

Multi-domain speech transcription

Use Cases

Speech Transcription

Telugu Meeting Minutes

Convert Telugu meeting recordings into text transcripts

Word Error Rate (WER) 13.39%

Voice Assistant Development

Used for developing Telugu voice assistants or chatbots

Educational Applications

Language Learning Tool

Helps learners practice Telugu pronunciation and listening

🚀 Whisper Telugu Base

This model is a fine - tuned version of [openai/whisper - base](https://huggingface.co/openai/whisper - base) on Telugu data from multiple publicly available ASR corpuses. It offers high - quality automatic speech recognition for Telugu.

🚀 Quick Start

This model is a fine - tuned version of openai/whisper-base on the Telugu data sourced from multiple publicly available ASR corpuses. It was fine - tuned as part of the Whisper fine - tuning sprint.

NOTE: The code for training this model can be reused from the whisper-finetune repository.

✨ Features

Fine - tuned on diverse Telugu ASR corpuses.
Supports both normal and faster inference methods.

📦 Installation

No specific installation steps are provided in the original README. However, to use the model, you need to have the necessary Python libraries such as torch, transformers, and for faster inference, whisper - jax. You can install them via pip:

pip install torch transformers
pip install git+https://github.com/sanchit-gandhi/whisper-jax

💻 Usage Examples

Basic Usage

In order to infer a single audio file using this model, the following code snippet can be used:

>>> import torch
>>> from transformers import pipeline

>>> # path to the audio file to be transcribed
>>> audio = "/path/to/audio.format"
>>> device = "cuda:0" if torch.cuda.is_available() else "cpu"

>>> transcribe = pipeline(task="automatic-speech-recognition", model="vasista22/whisper-telugu-base", chunk_length_s=30, device=device)
>>> transcribe.model.config.forced_decoder_ids = transcribe.tokenizer.get_decoder_prompt_ids(language="te", task="transcribe")

>>> print('Transcription: ', transcribe(audio)["text"])

Advanced Usage

For faster inference of whisper models, the whisper-jax library can be used. Please follow the necessary installation steps as mentioned here, before using the following code snippet:

>>> import jax.numpy as jnp
>>> from whisper_jax import FlaxWhisperForConditionalGeneration, FlaxWhisperPipline

>>> # path to the audio file to be transcribed
>>> audio = "/path/to/audio.format"

>>> transcribe = FlaxWhisperPipline("vasista22/whisper-telugu-base", batch_size=16)
>>> transcribe.model.config.forced_decoder_ids = transcribe.tokenizer.get_decoder_prompt_ids(language="te", task="transcribe")

>>> print('Transcription: ', transcribe(audio)["text"])

📚 Documentation

Evaluation

To evaluate this model on an entire dataset, use the evaluation codes in the whisper-finetune repository.

Faster Inference

The same repository also provides scripts for faster inference using whisper - jax.

🔧 Technical Details

Training and Evaluation Data

Property	Details
Training Data	CSTD IIIT - H ASR Corpus, [ULCA ASR Corpus](https://github.com/Open - Speech - EkStep/ULCA - asr - dataset - corpus#telugu - labelled - total - duration - is - 102593 - hours), Shrutilipi ASR Corpus, [Microsoft Speech Corpus (Indian Languages)](https://msropendata.com/datasets/7230b4b1 - 912d - 400e - be58 - f84e0512985e), Google/Fleurs Train+Dev set, Babel ASR Corpus
Evaluation Data	[Microsoft Speech Corpus (Indian Languages) Test Set](https://msropendata.com/datasets/7230b4b1 - 912d - 400e - be58 - f84e0512985e), Google/Fleurs Test Set, OpenSLR, Babel Test Set

Training Hyperparameters

Property	Details
learning_rate	3.3e - 05
train_batch_size	80
eval_batch_size	88
seed	22
optimizer	adamw_bnb_8bit
lr_scheduler_type	linear
lr_scheduler_warmup_steps	15000
training_steps	24174 (terminated upon convergence. Initially set to 85952 steps)
mixed_precision_training	True

📄 License

This project is licensed under the Apache - 2.0 license.

Acknowledgement

This work was carried out at Speech Lab, IIT Madras. The compute resources were funded by the "Bhashini: National Language translation Mission" project of the Ministry of Electronics and Information Technology (MeitY), Government of India.

Model Index

Property	Details
Model Name	Whisper Telugu Base - Vasista Sai Lodagala
Task	Automatic Speech Recognition
Dataset	google/fleurs (te_in split, test)
Metric (WER)	13.39

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご