đ IndicWhisper With JAX (more faster)
IndicWhisper is a state - of - the - art speech recognition model fine - tuned on Indian languages. It offers pre - trained checkpoints for immediate use and the code for training and evaluation.
đ Quick Start
IndicWhisper is a cutting - edge speech recognition model tailored for Indian languages. This repository equips users with pre - trained checkpoints for instant application and code for model training and evaluation.
⨠Features
- High Performance: Achieves remarkable Word Error Rates (WERs) on various Indian language benchmarks, outperforming other publicly available models.
- JAX Mode: Recently added support for JAX mode, which significantly boosts performance on both TPUs and GPUs, making it highly efficient for high - performance computing.
đ Documentation
Overview
IndicWhisper attains impressive Word Error Rates (WERs) on multiple benchmarks for Indian languages. It surpasses other publicly accessible models, making it an invaluable tool for speech recognition tasks in Indian languages.
Performance on Vistaar Benchmark (Hindi Subset)
Model |
Kathbath |
Kathbath - Hard |
FLEURS |
CommonVoice |
IndicTTS |
MUCS |
Gramvaani |
Average |
Google STT |
14.3 |
16.7 |
19.4 |
20.8 |
18.3 |
17.8 |
59.9 |
23.9 |
IndicWav2vec |
12.2 |
16.2 |
18.3 |
20.2 |
15 |
22.9 |
42.1 |
21 |
Azure STT |
13.6 |
15.1 |
24.3 |
14.6 |
15.2 |
15.1 |
42.3 |
20 |
Nvidia - medium |
14 |
15.6 |
19.4 |
20.4 |
12.3 |
12.4 |
41.3 |
19.4 |
Nvidia - large |
12.7 |
14.2 |
15.7 |
21.2 |
12.2 |
11.8 |
42.6 |
18.6 |
IndicWhisper |
10.3 |
12.0 |
11.4 |
15.0 |
7.6 |
12 |
26.8 |
13.6 |
đģ Usage Examples
Basic Usage
from whisper_jax import FlaxWhisperForConditionalGeneration, FlaxWhisperPipline
import jax.numpy as jnp
pipeline = FlaxWhisperPipline('parthiv11/indic_whisper_nodcil', dtype=jnp.bfloat16)
transcript = pipeline('sample.mp3')
Acknowledgements
We are grateful to the following organizations for their support:
- EkStep Foundation for their generous grant, which enabled the establishment of the Centre for AI4Bharat at IIT Madras.
- The Ministry of Electronics and Information Technology (NLTM) for its grant to support the creation of datasets and models for Indian languages under the Bhashini project.
- The Centre for Development of Advanced Computing, India (C - DAC), for providing access to the Param Siddhi supercomputer for training our models.
- Microsoft for its grant to create datasets, tools, and resources for Indian languages.
- For JAX guide on github
đ License
IndicWhisper and the associated Vistaar benchmark are MIT - licensed. This license applies to all the fine - tuned language models in this repository.
Contributors
- Kaushal Bhogale (AI4Bharat)
- Sai Narayan Sundaresan (IITKGP, AI4Bharat)
- Abhigyan Raman (AI4Bharat)
- Tahir Javed (IITM, AI4Bharat)
- Mitesh Khapra (IITM, AI4Bharat, RBCDSAI)
- Pratyush Kumar (Microsoft, AI4Bharat)
đ¤ Contributing
We welcome contributions from the community to enhance IndicWhisper. If you have ideas, bug fixes, or enhancements, feel free to submit a pull request.
Thank you for your interest in IndicWhisper! We hope it serves as a valuable tool for your Indian language speech recognition needs.