Whisper-Telugu-Large-v2 Open-Source Speech Recognition Model - Accurately Identify Telugu Speech Content

Whisper Telugu Large V2

Developed by vasista22

A Telugu automatic speech recognition model fine-tuned based on OpenAI Whisper-large-v2, trained on multiple public Telugu datasets

Speech Recognition OtherOpen Source License:Apache-2.0 #Telugu Speech Recognition #Low Word Error Rate #Multi-corpus Training

Downloads 156

Release Time : 12/20/2022

Model Overview

A speech recognition model specifically optimized for Telugu, capable of accurately converting Telugu speech into text

Model Features

Telugu Optimization

Specially fine-tuned for Telugu, providing more accurate speech recognition results

Multi-dataset Training

Trained on multiple public Telugu ASR corpora, including CSTD IIIT-H, ULCA, Shrutilipi, etc.

Efficient Inference Support

Supports accelerated inference using whisper-jax

Model Capabilities

Telugu Speech Recognition

Long Audio Processing (supports chunking)

Multi-domain Speech Transcription

Use Cases

Speech Transcription

Meeting Minutes

Convert Telugu meeting recordings into text transcripts

Media Subtitle Generation

Generate subtitles for Telugu video content

Voice Assistants

Telugu Voice Interaction

Supports Telugu voice command recognition

🚀 Whisper Telugu Large-v2

This model is a fine - tuned version of openai/whisper-large-v2 on Telugu data from multiple publicly available ASR corpuses. It was fine - tuned as part of the Whisper fine - tuning sprint.

🚀 Quick Start

This model is a fine - tuned version of openai/whisper-large-v2 on the Telugu data sourced from multiple publicly available Automatic Speech Recognition (ASR) corpuses. It was developed as part of the Whisper fine - tuning sprint.

⚠️ Important Note

The code used to train this model is available for re - use in the whisper-finetune repository.

✨ Features

Fine - tuned on multiple Telugu ASR corpuses.
Code for training, evaluation, and faster inference is available in the whisper-finetune repository.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

In order to infer a single audio file using this model, the following code snippet can be used:

>>> import torch
>>> from transformers import pipeline

>>> # path to the audio file to be transcribed
>>> audio = "/path/to/audio.format"
>>> device = "cuda:0" if torch.cuda.is_available() else "cpu"

>>> transcribe = pipeline(task="automatic-speech-recognition", model="vasista22/whisper-telugu-large-v2", chunk_length_s=30, device=device)
>>> transcribe.model.config.forced_decoder_ids = transcribe.tokenizer.get_decoder_prompt_ids(language="te", task="transcribe")

>>> print('Transcription: ', transcribe(audio)["text"])

Advanced Usage

For faster inference of whisper models, the whisper-jax library can be used. Please follow the necessary installation steps as mentioned here, before using the following code snippet:

>>> import jax.numpy as jnp
>>> from whisper_jax import FlaxWhisperForConditionalGeneration, FlaxWhisperPipline

>>> # path to the audio file to be transcribed
>>> audio = "/path/to/audio.format"

>>> transcribe = FlaxWhisperPipline("vasista22/whisper-telugu-large-v2", batch_size=16)
>>> transcribe.model.config.forced_decoder_ids = transcribe.tokenizer.get_decoder_prompt_ids(language="te", task="transcribe")

>>> print('Transcription: ', transcribe(audio)["text"])

📚 Documentation

In order to evaluate this model on an entire dataset, the evaluation codes available in the whisper-finetune repository can be used. The same repository also provides the scripts for faster inference using whisper - jax.

🔧 Technical Details

Training and Evaluation Data

Property	Details
Training Data	CSTD IIIT - H ASR Corpus, [ULCA ASR Corpus](https://github.com/Open - Speech - EkStep/ULCA - asr - dataset - corpus#telugu - labelled - total - duration - is - 102593 - hours), Shrutilipi ASR Corpus, [Microsoft Speech Corpus (Indian Languages)](https://msropendata.com/datasets/7230b4b1 - 912d - 400e - be58 - f84e0512985e), Google/Fleurs Train+Dev set, Babel ASR Corpus
Evaluation Data	[Microsoft Speech Corpus (Indian Languages) Test Set](https://msropendata.com/datasets/7230b4b1 - 912d - 400e - be58 - f84e0512985e), Google/Fleurs Test Set, OpenSLR, Babel Test Set

Training Hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.75e - 05
train_batch_size: 8
eval_batch_size: 32
seed: 22
optimizer: adamw_bnb_8bit
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 22000
training_steps: 75000
mixed_precision_training: True

📄 License

This model is licensed under the Apache - 2.0 license.

Acknowledgement

This work was done at Speech Lab, IIT Madras. The compute resources for this work were funded by "Bhashini: National Language translation Mission" project of the Ministry of Electronics and Information Technology (MeitY), Government of India.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご