đ Whisper Telugu Medium
This model is a fine - tuned version of openai/whisper-medium on Telugu data from multiple publicly available ASR corpuses. It was fine - tuned as part of the Whisper fine - tuning sprint, offering high - quality automatic speech recognition for Telugu language.
NOTE: The code for training this model can be reused from the whisper-finetune repository.
đ Quick Start
đģ Usage Examples
Basic Usage
>>> import torch
>>> from transformers import pipeline
>>>
>>> audio = "/path/to/audio.format"
>>> device = "cuda:0" if torch.cuda.is_available() else "cpu"
>>> transcribe = pipeline(task="automatic-speech-recognition", model="vasista22/whisper-telugu-medium", chunk_length_s=30, device=device)
>>> transcribe.model.config.forced_decoder_ids = transcribe.tokenizer.get_decoder_prompt_ids(language="te", task="transcribe")
>>> print('Transcription: ', transcribe(audio)["text"])
Advanced Usage
>>> import jax.numpy as jnp
>>> from whisper_jax import FlaxWhisperForConditionalGeneration, FlaxWhisperPipline
>>>
>>> audio = "/path/to/audio.format"
>>> transcribe = FlaxWhisperPipline("vasista22/whisper-telugu-medium", batch_size=16)
>>> transcribe.model.config.forced_decoder_ids = transcribe.tokenizer.get_decoder_prompt_ids(language="te", task="transcribe")
>>> print('Transcription: ', transcribe(audio)["text"])
In order to evaluate this model on an entire dataset, you can use the evaluation codes in the whisper-finetune repository. The same repository also has scripts for faster inference using whisper - jax.
For faster inference of whisper models, you can use the whisper-jax library. Make sure to follow the installation steps here before using the above advanced code snippet.
đĻ Installation
No specific installation steps are provided in the original document. However, to use the model, you need to install relevant libraries such as transformers
, torch
, and potentially whisper - jax
for faster inference. You can install them via pip
:
pip install transformers torch
If you want to use whisper - jax
for faster inference:
pip install git+https://github.com/sanchit-gandhi/whisper-jax.git
đ Documentation
Training and evaluation data
Training Data
- CSTD IIIT - H ASR Corpus
- [ULCA ASR Corpus](https://github.com/Open - Speech - EkStep/ULCA - asr - dataset - corpus#telugu - labelled - total - duration - is - 102593 - hours)
- Shrutilipi ASR Corpus
- [Microsoft Speech Corpus (Indian Languages)](https://msropendata.com/datasets/7230b4b1 - 912d - 400e - be58 - f84e0512985e)
- Google/Fleurs Train+Dev set
- Babel ASR Corpus
Evaluation Data
- [Microsoft Speech Corpus (Indian Languages) Test Set](https://msropendata.com/datasets/7230b4b1 - 912d - 400e - be58 - f84e0512985e)
- Google/Fleurs Test Set
- OpenSLR
- Babel Test Set
Training hyperparameters
Property |
Details |
learning_rate |
1e - 05 |
train_batch_size |
24 |
eval_batch_size |
48 |
seed |
22 |
optimizer |
adamw_bnb_8bit |
lr_scheduler_type |
linear |
lr_scheduler_warmup_steps |
15000 |
training_steps |
35808 (terminated upon convergence. Initially set to 89520 steps) |
mixed_precision_training |
True |
Acknowledgement
This work was done at Speech Lab, IIT Madras. The compute resources were funded by the "Bhashini: National Language translation Mission" project of the Ministry of Electronics and Information Technology (MeitY), Government of India.
đ License
This model is released under the Apache - 2.0 license.
đ Model Index
Property |
Details |
Model Name |
Whisper Telugu Medium - Vasista Sai Lodagala |
Task Type |
Automatic Speech Recognition |
Dataset |
google/fleurs (te_in split, test set) |
Metric (WER) |
9.47 |