🚀 vegam-whipser-medium-ml (വേഗം)
This project converts thennal/whisper-medium-ml into the CTranslate2 model format. It enables the use of this model in CTranslate2 or related projects like faster-whisper, facilitating automatic speech recognition tasks.
⚠️ Important Note
The model file size is 3.06 GB.
🚀 Quick Start
This model can be used in CTranslate2 or projects based on CTranslate2 such as faster-whisper.
✨ Features
- Audio Processing: Specialized for audio data, suitable for automatic speech recognition tasks.
- Language Support: Supports Malayalam (
ml
), expanding its application scope in language processing.
- Model Compatibility: Converted to the CTranslate2 format, ensuring compatibility with related projects.
📦 Installation
1. Install faster-whisper
Install faster-whisper. More details about installation can be found here in faster-whisper.
pip install faster-whisper
2. Install git-lfs
Install git-lfs for using this project. Other approaches for downloading git-lfs in non-debian based systems.
Note that git-lfs is just for downloading model from hugging-face.
apt-get install git-lfs
3. Download the model weights
git lfs install
git clone https://huggingface.co/kurianbenoy/vegam-whisper-medium-ml
💻 Usage Examples
Basic Usage
from faster_whisper import WhisperModel
model_path = "vegam-whisper-medium-ml"
model = WhisperModel(model_path, device="cuda", compute_type="float16")
segments, info = model.transcribe("audio.mp3", beam_size=5)
print("Detected language '%s' with probability %f" % (info.language, info.language_probability))
for segment in segments:
print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
Advanced Usage
from faster_whisper import WhisperModel
model_path = "vegam-whisper-medium-ml"
model = WhisperModel(model_path, device="cuda", compute_type="float16")
segments, info = model.transcribe("00b38e80-80b8-4f70-babf-566e848879fc.webm", beam_size=5)
print("Detected language '%s' with probability %f" % (info.language, info.language_probability))
for segment in segments:
print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
Detected language 'ta' with probability 0.353516
[0.00s -> 4.74s] പാലം കടുക്കുവോളം നാരായണ പാലം കടന്നാലൊ കൂരായണ
Note: The audio file 00b38e80-80b8-4f70-babf-566e848879fc.webm is from Malayalam Speech Corpus and is stored along with model weights.
📚 Documentation
Conversion Details
This conversion was possible with wonderful CTranslate2 library leveraging the Transformers converter for OpenAI Whisper. The original model was converted with the following command:
ct2-transformers-converter --model thennal/whisper-medium-ml --output_dir vegam-whisper-medium-ml
📄 License
This project is licensed under the MIT license.
👏 Acknowledgments
- Creators of CTranslate2 and faster-whisper
- Thennal D K
- Santhosh Thottingal
📋 Metadata
Property |
Details |
Language |
ml |
Tags |
audio, automatic-speech-recognition, vegam |
Datasets |
google/fleurs, thennal/IMaSC, mozilla-foundation/common_voice_11_0 |
Library Name |
ctranslate2 |