# Whisper-medium-ml Open-source Speech Recognition Model - Free Automatic Speech Recognition for Malayalam

Whisper Medium Ml

Developed by thennal

Malayalam automatic speech recognition model fine-tuned based on OpenAI Whisper-medium, trained on datasets including Common Voice 11.0

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Malayalam ASR #Low Word Error Rate #Multi-dataset Fine-tuning

Downloads 127

Release Time : 12/12/2022

Model Overview

This model is an optimized automatic speech recognition (ASR) system for Malayalam, fine-tuned on the Whisper-medium architecture, supporting high-accuracy speech-to-text functionality

Model Features

Multi-dataset Training

Incorporates training from Common Voice 11.0, Fleurs, and multiple Malayalam-specific datasets

Optimized Error Rate

Achieves a word error rate (WER) of 11.49 on the Common Voice test set

Standardization Processing

Optimized text standardization processing pipeline for Malayalam characteristics

Model Capabilities

Malayalam speech recognition

Long audio processing (supports 30-second chunks)

Timestamped transcription (optional)

Use Cases

Speech Transcription

Speech Content Transcription

Convert Malayalam speech content into text

Achieves 88.51% word recognition accuracy on test sets

Assistive Tools

Accessibility Applications

Provides real-time caption generation for the hearing impaired

🚀 Whisper Medium Malayalam

This model is a fine - tuned version of [openai/whisper - medium](https://huggingface.co/openai/whisper - medium), designed to enhance automatic speech recognition for Malayalam.

🚀 Quick Start

This model is a fine - tuned version of [openai/whisper - medium](https://huggingface.co/openai/whisper - medium) on the Common Voice 11.0 dataset. It achieves the following results on the evaluation set:

WER: 38.6207
CER: 7.3256

Note that Whisper's normalization has major issues for languages like Malayalam, so the above scores are evaluated without using normalization. With normalization (for a fair comparison with other models on this platform), the results are instead:

WER: 11.49

[This Colab](https://colab.research.google.com/github/sanchit - gandhi/notebooks/blob/main/fine_tune_whisper.ipynb) can be used as a starting point to further finetune the model.

💻 Usage Examples

Basic Usage

from transformers import pipeline, WhisperProcessor

processor = WhisperProcessor.from_pretrained("thennal/whisper-medium-ml")
forced_decoder_ids = processor.get_decoder_prompt_ids(language="ml", task="transcribe")
asr = pipeline(
        "automatic-speech-recognition", model="thennal/whisper-medium-ml", device=0,
    )
transcription = asr(audio, chunk_length_s=30, max_new_tokens=448, return_timestamps=False,  generate_kwargs={
        "forced_decoder_ids": forced_decoder_ids, 
        "do_sample": True,
    })

📚 Documentation

Model Details

Property	Details
Model Type	Fine - tuned version of [openai/whisper - medium](https://huggingface.co/openai/whisper - medium)
Training Data	- mozilla - foundation/common_voice_11_0 - google/fleurs - thennal/IMaSC - thennal/ulca_ml - thennal/msc - thennal/indic_tts_ml
Metrics	- wer
Base Model	openai/whisper - medium

Model Index

Name: Whisper Medium Malayalam - Thennal D K
- Results:
  - Task:
    - Type: automatic - speech - recognition
    - Name: Automatic Speech Recognition
  - Dataset:
    - Name: Common Voice 11.0
    - Type: mozilla - foundation/common_voice_11_0
    - Config: ml
    - Split: test
    - Args: ml
  - Metrics:
    - Type: wer
    - Value: 11.49
    - Name: WER

Training Procedure

Training Hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e - 05
train_batch_size: 32
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
training_steps: 8000
mixed_precision_training: Native AMP

Framework Versions

Transformers 4.26.0.dev0
Pytorch 1.13.0+cu117
Datasets 2.7.1.dev0
Tokenizers 0.13.2

📄 License

This model is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご