đ IndicWhisper With JAX (more faster)
IndicWhisper is a state - of - the - art speech recognition model fine - tuned on Indian languages, offering pre - trained checkpoints and code for training and evaluation.
đ Quick Start
IndicWhisper is a state - of - the - art speech recognition model fine - tuned on Indian languages. This repository contains the code for training and evaluating the model, as well as pre - trained checkpoints for immediate use.
⨠Features
- High Performance: IndicWhisper achieves impressive Word Error Rates (WERs) on various benchmarks for Indian languages, outperforming other publicly available models.
- JAX Mode: Recently added support for JAX mode significantly enhances performance on both TPUs and GPUs, making it the fastest Whisper implementation available.
đ Documentation
Overview
IndicWhisper achieves impressive Word Error Rates (WERs) on various benchmarks for Indian languages. It outperforms other publicly available models, making it a valuable asset for speech recognition tasks in Indian languages.
Performance on Vistaar Benchmark (Hindi Subset)
Model |
Kathbath |
Kathbath - Hard |
FLEURS |
CommonVoice |
IndicTTS |
MUCS |
Gramvaani |
Average |
Google STT |
14.3 |
16.7 |
19.4 |
20.8 |
18.3 |
17.8 |
59.9 |
23.9 |
IndicWav2vec |
12.2 |
16.2 |
18.3 |
20.2 |
15 |
22.9 |
42.1 |
21 |
Azure STT |
13.6 |
15.1 |
24.3 |
14.6 |
15.2 |
15.1 |
42.3 |
20 |
Nvidia - medium |
14 |
15.6 |
19.4 |
20.4 |
12.3 |
12.4 |
41.3 |
19.4 |
Nvidia - large |
12.7 |
14.2 |
15.7 |
21.2 |
12.2 |
11.8 |
42.6 |
18.6 |
IndicWhisper |
10.3 |
12.0 |
11.4 |
15.0 |
7.6 |
12 |
26.8 |
13.6 |
đģ Usage Examples
Basic Usage
from whisper_jax import FlaxWhisperForConditionalGeneration, FlaxWhisperPipline
import jax.numpy as jnp
pipeline = FlaxWhisperPipline('parthiv11/indic_whisper_hi_multi_gpu', dtype=jnp.bfloat16)
transcript = pipeline('sample.mp3')
Acknowledgements
We would like to express our gratitude to the following organizations for their support:
- EkStep Foundation for their generous grant, which facilitated the establishment of the Centre for AI4Bharat at IIT Madras.
- The Ministry of Electronics and Information Technology (NLTM) for its grant to support the creation of datasets and models for Indian languages under the Bhashini project.
- The Centre for Development of Advanced Computing, India (C - DAC), for providing access to the Param Siddhi supercomputer for training our models.
- Microsoft for its grant to create datasets, tools, and resources for Indian languages.
- For JAX guide on github
đ License
IndicWhisper and the associated Vistaar benchmark are MIT - licensed. This license applies to all the fine - tuned language models included in this repository.
Contributors
- Kaushal Bhogale (AI4Bharat)
- Sai Narayan Sundaresan (IITKGP, AI4Bharat)
- Abhigyan Raman (AI4Bharat)
- Tahir Javed (IITM, AI4Bharat)
- Mitesh Khapra (IITM, AI4Bharat, RBCDSAI)
- Pratyush Kumar (Microsoft, AI4Bharat)
đ¤ Contributing
We welcome contributions from the community to further improve IndicWhisper. If you have any ideas, bug fixes, or enhancements, please feel free to submit a pull request.
Thank you for your interest in IndicWhisper! We hope it proves to be a valuable tool for your speech recognition needs in Indian languages.