Vegam-Whisper-Medium-ML Open Source Model - Free Deployment for Malayalam Speech Recognition

Vegam Whisper Medium Ml

Developed by smcproject

This is a version of thennal/whisper-medium-ml converted to the CTranslate2 model format for Malayalam speech recognition

Speech Recognition OtherOpen Source License:MIT #Malayalam speech recognition #Fast inference #Multi-dataset training

Downloads 83

Release Time : 5/19/2023

Model Overview

This model is a Malayalam automatic speech recognition (ASR) model based on the Whisper architecture, converted to CTranslate2 format to improve inference speed

Model Features

Efficient inference

Converted to CTranslate2 format, supports fast inference using faster-whisper

Multi-dataset training

Trained on multiple datasets including google/fleurs, thennal/IMaSC, and mozilla-foundation/common_voice_11_0

Multi-precision support

Supports various computational precisions such as FP16 and INT8, optimizing performance on different hardware

Model Capabilities

Malayalam speech recognition

Audio-to-text conversion

Multi-precision inference

Use Cases

Speech transcription

Audio file transcription

Convert Malayalam speech files to text

Examples demonstrate accurate speech recognition results

Speech processing applications

Voice assistant

Used for developing Malayalam voice assistants

🚀 vegam-whipser-medium-ml (വേഗം)

This project converts thennal/whisper-medium-ml into the CTranslate2 model format. It enables the use of this model in CTranslate2 or related projects like faster-whisper, facilitating automatic speech recognition tasks.

⚠️ Important Note

The model file size is 3.06 GB.

🚀 Quick Start

This model can be used in CTranslate2 or projects based on CTranslate2 such as faster-whisper.

✨ Features

Audio Processing: Specialized for audio data, suitable for automatic speech recognition tasks.
Language Support: Supports Malayalam (ml), expanding its application scope in language processing.
Model Compatibility: Converted to the CTranslate2 format, ensuring compatibility with related projects.

📦 Installation

1. Install `faster-whisper`

Install faster-whisper. More details about installation can be found here in faster-whisper.

pip install faster-whisper

2. Install `git-lfs`

Install git-lfs for using this project. Other approaches for downloading git-lfs in non-debian based systems. Note that git-lfs is just for downloading model from hugging-face.

apt-get install git-lfs

3. Download the model weights

git lfs install
git clone https://huggingface.co/kurianbenoy/vegam-whisper-medium-ml

💻 Usage Examples

Basic Usage

from faster_whisper import WhisperModel

model_path = "vegam-whisper-medium-ml"

# Run on GPU with FP16
model = WhisperModel(model_path, device="cuda", compute_type="float16")

# or run on GPU with INT8
# model = WhisperModel(model_path, device="cuda", compute_type="int8_float16")
# or run on CPU with INT8
# model = WhisperModel(model_path, device="cpu", compute_type="int8")

segments, info = model.transcribe("audio.mp3", beam_size=5)

print("Detected language '%s' with probability %f" % (info.language, info.language_probability))

for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

Advanced Usage

from faster_whisper import WhisperModel

model_path = "vegam-whisper-medium-ml"

model = WhisperModel(model_path, device="cuda", compute_type="float16")

segments, info = model.transcribe("00b38e80-80b8-4f70-babf-566e848879fc.webm", beam_size=5)

print("Detected language '%s' with probability %f" % (info.language, info.language_probability))

for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

Detected language 'ta' with probability 0.353516

[0.00s -> 4.74s] പാലം കടുക്കുവോളം നാരായണ പാലം കടന്നാലൊ കൂരായണ

Note: The audio file 00b38e80-80b8-4f70-babf-566e848879fc.webm is from Malayalam Speech Corpus and is stored along with model weights.

📚 Documentation

Conversion Details

This conversion was possible with wonderful CTranslate2 library leveraging the Transformers converter for OpenAI Whisper. The original model was converted with the following command:

ct2-transformers-converter --model thennal/whisper-medium-ml --output_dir vegam-whisper-medium-ml

📄 License

This project is licensed under the MIT license.

👏 Acknowledgments

Creators of CTranslate2 and faster-whisper
Thennal D K
Santhosh Thottingal

📋 Metadata

Property	Details
Language	ml
Tags	audio, automatic-speech-recognition, vegam
Datasets	google/fleurs, thennal/IMaSC, mozilla-foundation/common_voice_11_0
Library Name	ctranslate2

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご