đ Whisper Telugu Base
This model is a fine - tuned version of [openai/whisper - base](https://huggingface.co/openai/whisper - base) on Telugu data from multiple publicly available ASR corpuses. It offers high - quality automatic speech recognition for Telugu.
đ Quick Start
This model is a fine - tuned version of openai/whisper-base on the Telugu data sourced from multiple publicly available ASR corpuses. It was fine - tuned as part of the Whisper fine - tuning sprint.
NOTE: The code for training this model can be reused from the whisper-finetune repository.
⨠Features
- Fine - tuned on diverse Telugu ASR corpuses.
- Supports both normal and faster inference methods.
đĻ Installation
No specific installation steps are provided in the original README. However, to use the model, you need to have the necessary Python libraries such as torch
, transformers
, and for faster inference, whisper - jax
. You can install them via pip
:
pip install torch transformers
pip install git+https://github.com/sanchit-gandhi/whisper-jax
đģ Usage Examples
Basic Usage
In order to infer a single audio file using this model, the following code snippet can be used:
>>> import torch
>>> from transformers import pipeline
>>>
>>> audio = "/path/to/audio.format"
>>> device = "cuda:0" if torch.cuda.is_available() else "cpu"
>>> transcribe = pipeline(task="automatic-speech-recognition", model="vasista22/whisper-telugu-base", chunk_length_s=30, device=device)
>>> transcribe.model.config.forced_decoder_ids = transcribe.tokenizer.get_decoder_prompt_ids(language="te", task="transcribe")
>>> print('Transcription: ', transcribe(audio)["text"])
Advanced Usage
For faster inference of whisper models, the whisper-jax library can be used. Please follow the necessary installation steps as mentioned here, before using the following code snippet:
>>> import jax.numpy as jnp
>>> from whisper_jax import FlaxWhisperForConditionalGeneration, FlaxWhisperPipline
>>>
>>> audio = "/path/to/audio.format"
>>> transcribe = FlaxWhisperPipline("vasista22/whisper-telugu-base", batch_size=16)
>>> transcribe.model.config.forced_decoder_ids = transcribe.tokenizer.get_decoder_prompt_ids(language="te", task="transcribe")
>>> print('Transcription: ', transcribe(audio)["text"])
đ Documentation
Evaluation
To evaluate this model on an entire dataset, use the evaluation codes in the whisper-finetune repository.
Faster Inference
The same repository also provides scripts for faster inference using whisper - jax.
đ§ Technical Details
Training and Evaluation Data
Property |
Details |
Training Data |
CSTD IIIT - H ASR Corpus, [ULCA ASR Corpus](https://github.com/Open - Speech - EkStep/ULCA - asr - dataset - corpus#telugu - labelled - total - duration - is - 102593 - hours), Shrutilipi ASR Corpus, [Microsoft Speech Corpus (Indian Languages)](https://msropendata.com/datasets/7230b4b1 - 912d - 400e - be58 - f84e0512985e), Google/Fleurs Train+Dev set, Babel ASR Corpus |
Evaluation Data |
[Microsoft Speech Corpus (Indian Languages) Test Set](https://msropendata.com/datasets/7230b4b1 - 912d - 400e - be58 - f84e0512985e), Google/Fleurs Test Set, OpenSLR, Babel Test Set |
Training Hyperparameters
Property |
Details |
learning_rate |
3.3e - 05 |
train_batch_size |
80 |
eval_batch_size |
88 |
seed |
22 |
optimizer |
adamw_bnb_8bit |
lr_scheduler_type |
linear |
lr_scheduler_warmup_steps |
15000 |
training_steps |
24174 (terminated upon convergence. Initially set to 85952 steps) |
mixed_precision_training |
True |
đ License
This project is licensed under the Apache - 2.0 license.
Acknowledgement
This work was carried out at Speech Lab, IIT Madras. The compute resources were funded by the "Bhashini: National Language translation Mission" project of the Ministry of Electronics and Information Technology (MeitY), Government of India.
Model Index
Property |
Details |
Model Name |
Whisper Telugu Base - Vasista Sai Lodagala |
Task |
Automatic Speech Recognition |
Dataset |
google/fleurs (te_in split, test) |
Metric (WER) |
13.39 |