whisper-small-sinhala Open Source Sinhala Speech Recognition Model - Free and Accurate Speech-to-Text Conversion

Home

Whisper Small Sinhala

Developed by Lingalingeswaran

A Sinhala speech recognition model fine-tuned based on OpenAI Whisper-small

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Sinhala speech recognition #Low-resource optimization #Real-time transcription

Downloads 667

Release Time : 1/25/2025

Model Overview

This model is an automatic speech recognition (ASR) system optimized for Sinhala, suitable for converting Sinhala speech into text.

Model Features

Sinhala optimization

Specially fine-tuned for Sinhala, improving recognition accuracy for this language

Based on Whisper architecture

Utilizes the OpenAI Whisper-small model architecture, offering excellent speech recognition capabilities

Open-source license

Licensed under Apache-2.0, allowing for both commercial and research use

Model Capabilities

Sinhala speech recognition

Real-time speech-to-text

Audio file transcription

Use Cases

Speech transcription

Meeting minutes

Automatically convert Sinhala meeting recordings into text transcripts

Voice notes

Convert Sinhala voice notes into searchable text

Assistive technology

Voice input system

Provide voice input functionality for Sinhala users

🚀 Whisper Small sinhala - Lingalingeswaran

This model is a fine - tuned version of openai/whisper-small on the Lingalingeswaran/asr-sinhala-dataset_json_v1 dataset, designed for Sinhala speech - related tasks.

🚀 Quick Start

This model is a fine - tuned version of openai/whisper-small on the Lingalingeswaran/asr-sinhala-dataset_json_v1 dataset.

✨ Features

Model description

This Whisper model has been fine - tuned specifically for the Sinhala language using the Common Voice 11.0 dataset. It is designed to handle tasks such as speech - to - text transcription and language identification, making it suitable for applications where Sinhala is a primary language of interest. The fine - tuning process focused on enhancing performance for Sinhala, aiming to reduce the error rate in transcriptions and improve general accuracy.

Intended uses & limitations

Intended Uses

Speech - to - text transcription in Sinhala

Limitations

May not perform as well on languages or dialects that are not well - represented in the Common Voice dataset.
Higher Word Error Rate (WER) in noisy environments or with speakers who have heavy accents not covered in the training data.
The model is optimized for Sinhala; performance in other languages may be suboptimal.

📦 Installation

No specific installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

Here is an example of how to use the model for Sinhala speech recognition with Gradio:

import gradio as gr
from transformers import pipeline

# Initialize the pipeline with the specified model
pipe = pipeline(model="Lingalingeswaran/whisper-small-sinhala")

def transcribe(audio):
    # Transcribe the audio file to text
    text = pipe(audio)["text"]
    return text

# Create the Gradio interface

iface = gr.Interface(
    fn=transcribe,
    inputs=gr.Audio(sources=["microphone", "upload"], type="filepath"),
    outputs="text",
    title="Whisper Small Sinhala",
    description="Realtime demo for Sinhala speech recognition using a fine - tuned Whisper small model.",
)

# Launch the interface
if __name__ == "__main__":
    iface.launch()

📚 Documentation

Training and evaluation data

The training data for this model consists of voice recordings in Sinhala from the Mozilla - foundation/Common Voice 11.0 dataset. The dataset is a crowd - sourced collection of transcribed speech, ensuring diversity in terms of speaker accents, age groups, and speech styles.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e - 05
train_batch_size: 16
eval_batch_size: 8
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon = 1e - 08 and optimizer_args = No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
training_steps: 4000
mixed_precision_training: Native AMP

Framework versions

Transformers 4.48.1
Pytorch 2.5.1+cu121
Datasets 3.2.0
Tokenizers 0.21.0

🔧 Technical Details

No specific technical details beyond what's already covered are provided, so this section is skipped.

📄 License

This model is released under the apache - 2.0 license.

Property	Details
Model Type	Fine - tuned Whisper Small for Sinhala
Training Data	Lingalingeswaran/asr - sinhala - dataset_json_v1, Mozilla - foundation/Common Voice 11.0

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご