Malaysian-Whisper-Base Open Source Speech Recognition Model - Free Support for Malay and English Recognition

Home

Malaysian Whisper Base

Developed by mesolitica

Whisper base model fine-tuned on Malaysian datasets, supporting Malay and English speech recognition

Speech Recognition

Transformers

Supports Multiple Languages#Malay speech recognition #Multi-dialect support #English-Malay bilingual

Downloads 143

Release Time : 1/1/2024

Model Overview

This model is a speech recognition model based on the Whisper architecture, specifically fine-tuned for Malay and English in the Malaysian context, suitable for speech-to-text tasks involving Malaysian accents and dialects.

Model Features

Malaysian Language Optimization

Specifically optimized for Malay and English accents in Malaysia, including standard Malay and dialects

Multi-source Training Data

Trained using various data sources including IMDA speech-to-text datasets and pseudo-labeled Malaysian YouTube video datasets

Bilingual Support

Supports both Malay and English speech recognition, including Manglish (Malaysian English)

Timestamp Support

Capable of generating transcriptions with timestamps

Model Capabilities

Malay speech recognition

English speech recognition

Timestamped transcription

Malaysian accent recognition

Use Cases

Speech Transcription

Meeting Minutes

Automatically transcribe meeting recordings in Malaysia into text

Accurately recognizes Malay and English with Malaysian accents

Media Content Subtitling

Automatically generate subtitles for Malaysian YouTube videos

Supports recognition of dialects and local accents

Speech Analysis

Speech Data Analysis

Analyze speech data from Malaysia to gain insights

Capable of processing language variants unique to Malaysia

🚀 Malaysian Finetune Whisper Base

This project focuses on fine - tuning the Whisper Base model on a Malaysian dataset. It aims to enhance the model's performance in transcribing Malaysian languages, including Malay and English, with various accents and dialects.

🚀 Quick Start

📦 Installation

Ensure you have the necessary libraries installed. You can install them using pip:

pip install transformers datasets requests

💻 Usage Examples

🔍 Basic Usage

from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq, pipeline
from datasets import Audio
import requests

sr = 16000
audio = Audio(sampling_rate=sr)

processor = AutoProcessor.from_pretrained("mesolitica/malaysian-whisper-base")
model = AutoModelForSpeechSeq2Seq.from_pretrained("mesolitica/malaysian-whisper-base")

r = requests.get('https://huggingface.co/datasets/huseinzol05/malaya-speech-stt-test-set/resolve/main/test.mp3')
y = audio.decode_example(audio.encode_example(r.content))['array']
inputs = processor([y], return_tensors = 'pt')
r = model.generate(inputs['input_features'], language='ms', return_timestamps=True)
processor.tokenizer.decode(r[0])

The output for Malay language prediction:

'<|startoftranscript|><|ms|><|transcribe|> Zamily On Aging di Vener Australia, Australia yang telah diadakan pada tahun 1982 dan berasaskan unjuran tersebut maka jabatan perangkaan Malaysia menganggarkan menjelang tahun 2005 sejumlah 15% penduduk kita adalah daripada kalangan warga emas. Untuk makluman Tuan Yang Pertua dan juga Alian Bohon, pembangunan sistem pendafiran warga emas ataupun kita sebutkan event adalah usaha kerajaan ke arah merealisasikan objektif yang telah digangkatkan<|endoftext|>'

🔍 Advanced Usage (Predicting in English)

r = model.generate(inputs['input_features'], language='en', return_timestamps=True)
processor.tokenizer.decode(r[0])

The output for English language prediction:

<|startoftranscript|><|en|><|transcribe|> Assembly on Aging, Divina Australia, Australia, which has been provided in 1982 and the operation of the transportation of Malaysia's implementation to prevent the tourism of the 25th, 15% of our population is from the market. For the information of the President and also the respected, the development of the market system or we have made an event.<|endoftext|>

🎧 Predicting Longer Audio

⚠️ Important Note

You need to chunk the audio by 30 seconds and predict each sample.

📚 Documentation

📊 Datasets Used

The model is fine - tuned on the following datasets:

Property	Details
Datasets	1. IMDA STT, https://huggingface.co/datasets/mesolitica/IMDA - STT 2. Pseudolabel Malaysian youtube videos, https://huggingface.co/datasets/mesolitica/pseudolabel - malaysian - youtube - whisper - large - v3 3. Malay Conversational Speech Corpus, https://huggingface.co/datasets/malaysia - ai/malay - conversational - speech - corpus 4. Haqkiem TTS Dataset (private, request access from https://www.linkedin.com/in/haqkiem - daim/) 5. Pseudolabel Nusantara audiobooks, https://huggingface.co/datasets/mesolitica/nusantara - audiobook

Property

Details

Datasets

1. IMDA STT, https://huggingface.co/datasets/mesolitica/IMDA - STT
2. Pseudolabel Malaysian youtube videos, https://huggingface.co/datasets/mesolitica/pseudolabel - malaysian - youtube - whisper - large - v3
3. Malay Conversational Speech Corpus, https://huggingface.co/datasets/malaysia - ai/malay - conversational - speech - corpus
4. Haqkiem TTS Dataset (private, request access from https://www.linkedin.com/in/haqkiem - daim/)
5. Pseudolabel Nusantara audiobooks, https://huggingface.co/datasets/mesolitica/nusantara - audiobook

🌐 Languages Finetuned

ms, Malay, can be standard Malay and local Malay.
en, English, can be standard English and Manglish.

📈 Project Links

Script: https://github.com/mesolitica/malaya - speech/tree/malaysian - speech/session/whisper
Wandb: https://wandb.ai/huseinzol05/malaysian - whisper - base?workspace = user - huseinzol05
Wandb report: https://wandb.ai/huseinzol05/malaysian - whisper - base/reports/Finetune - Whisper --Vmlldzo2Mzg2NDgx

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご