indic_whisper_nodcil Open - source Speech Recognition Model - Optimized for Indian Languages with Excellent Recognition Results

Indic Whisper Nodcil

Developed by parthiv11

IndicWhisper is a cutting-edge speech recognition model optimized for Indian languages, excelling in various benchmark tests for Indian languages.

Speech Recognition OtherOpen Source License:MIT #Hindi Speech Recognition #JAX Acceleration #Low Word Error Rate

Downloads 253

Release Time : 2/28/2024

Model Overview

IndicWhisper is a JAX-accelerated speech recognition model for Indian languages, optimized for Indian languages and outperforming other publicly available models.

Model Features

JAX Acceleration

Supports JAX mode, significantly improving TPU and GPU computing performance, with speeds up to 70x faster than the original PyTorch code.

Optimized for Indian Languages

Specially optimized for Indian languages, delivering outstanding performance in various Indian language benchmarks.

High Performance

Achieves an average word error rate of just 13.6% on the Vistaar benchmark, outperforming other publicly available models.

Model Capabilities

Hindi Speech Recognition

High-Performance Speech-to-Text

Multi-Scenario Speech Processing

Use Cases

Speech Transcription

Meeting Minutes

Convert Hindi meeting recordings into text transcripts

Highly accurate text output

Voice Assistants

Provide speech recognition capabilities for Hindi voice assistants

Fast and accurate voice command recognition

Education

Language Learning

Help learners practice Hindi pronunciation

Real-time pronunciation assessment and correction

🚀 IndicWhisper With JAX (more faster)

IndicWhisper is a state - of - the - art speech recognition model fine - tuned on Indian languages. It offers pre - trained checkpoints for immediate use and the code for training and evaluation.

🚀 Quick Start

IndicWhisper is a cutting - edge speech recognition model tailored for Indian languages. This repository equips users with pre - trained checkpoints for instant application and code for model training and evaluation.

✨ Features

High Performance: Achieves remarkable Word Error Rates (WERs) on various Indian language benchmarks, outperforming other publicly available models.
JAX Mode: Recently added support for JAX mode, which significantly boosts performance on both TPUs and GPUs, making it highly efficient for high - performance computing.

📚 Documentation

Overview

IndicWhisper attains impressive Word Error Rates (WERs) on multiple benchmarks for Indian languages. It surpasses other publicly accessible models, making it an invaluable tool for speech recognition tasks in Indian languages.

Performance on Vistaar Benchmark (Hindi Subset)

Model	Kathbath	Kathbath - Hard	FLEURS	CommonVoice	IndicTTS	MUCS	Gramvaani	Average
Google STT	14.3	16.7	19.4	20.8	18.3	17.8	59.9	23.9
IndicWav2vec	12.2	16.2	18.3	20.2	15	22.9	42.1	21
Azure STT	13.6	15.1	24.3	14.6	15.2	15.1	42.3	20
Nvidia - medium	14	15.6	19.4	20.4	12.3	12.4	41.3	19.4
Nvidia - large	12.7	14.2	15.7	21.2	12.2	11.8	42.6	18.6
IndicWhisper	10.3	12.0	11.4	15.0	7.6	12	26.8	13.6

💻 Usage Examples

Basic Usage

from whisper_jax import FlaxWhisperForConditionalGeneration, FlaxWhisperPipline
import jax.numpy as jnp

pipeline = FlaxWhisperPipline('parthiv11/indic_whisper_nodcil', dtype=jnp.bfloat16)
transcript = pipeline('sample.mp3')

Acknowledgements

We are grateful to the following organizations for their support:

EkStep Foundation for their generous grant, which enabled the establishment of the Centre for AI4Bharat at IIT Madras.
The Ministry of Electronics and Information Technology (NLTM) for its grant to support the creation of datasets and models for Indian languages under the Bhashini project.
The Centre for Development of Advanced Computing, India (C - DAC), for providing access to the Param Siddhi supercomputer for training our models.
Microsoft for its grant to create datasets, tools, and resources for Indian languages.
For JAX guide on github

📄 License

IndicWhisper and the associated Vistaar benchmark are MIT - licensed. This license applies to all the fine - tuned language models in this repository.

Contributors

Kaushal Bhogale (AI4Bharat)
Sai Narayan Sundaresan (IITKGP, AI4Bharat)
Abhigyan Raman (AI4Bharat)
Tahir Javed (IITM, AI4Bharat)
Mitesh Khapra (IITM, AI4Bharat, RBCDSAI)
Pratyush Kumar (Microsoft, AI4Bharat)

🤝 Contributing

We welcome contributions from the community to enhance IndicWhisper. If you have ideas, bug fixes, or enhancements, feel free to submit a pull request.

Thank you for your interest in IndicWhisper! We hope it serves as a valuable tool for your Indian language speech recognition needs.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご