Whisper - Hindi2Hinglish - Swift Open - Source Speech Recognition Model: Accurately Identify Speech with Indian Accents and in Noisy Environments

Whisper Hindi2Hinglish Swift

Developed by Oriserve

A Hindi-Hinglish mixed speech recognition model optimized based on the Whisper architecture, specifically designed for Indian accents and noisy environments

Speech Recognition

Transformers

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Hindi-English mixed recognition #Noise environment optimization #Indian accent adaptation

Downloads 496

Release Time : 1/7/2025

Model Overview

This model is a fine-tuned version of Whisper-base, focusing on transcribing Hindi speech into colloquial Hindi-English mixed text, suitable for speech recognition scenarios in India

Model Features

Hindi-English mixed language support

Added capability to transcribe audio into colloquial Hindi-English mixed text, reducing the probability of grammatical errors

Noise environment optimization

Specially optimized for common background noise environments in India, improving recognition accuracy in noisy scenarios

Hallucination suppression

Minimizes transcription hallucinations through training techniques, enhancing the accuracy of output text

Dynamic layer freezing technology

Adopts innovative training techniques for rapid convergence and efficient fine-tuning

Model Capabilities

Hindi speech recognition

Hindi-English mixed text generation

Speech transcription in noisy environments

Long audio processing

Use Cases

Speech transcription services

Customer service call transcription

Transcribing customer service call content in India into text records

Maintains high recognition accuracy in noisy environments

Meeting minutes

Automatically generating Hindi-English mixed meeting summaries

Supports multi-speaker dialogue scenarios

Voice assistants

Localized voice command recognition

Providing more accurate voice command recognition for users in India

Supports Hindi-English mixed colloquial expressions

language:

en
hi tags:
audio
automatic-speech-recognition
whisper-event
pytorch inference: true model-index:
name: Whisper-Hindi2Hinglish-Swift results:
- task: type: automatic-speech-recognition name: Automatic Speech Recognition dataset: name: google/fleurs type: google/fleurs config: hi_in split: test metrics:
  - type: wer value: 35.0888 name: WER
- task: type: automatic-speech-recognition name: Automatic Speech Recognition dataset: name: mozilla-foundation/common_voice_20_0 type: mozilla-foundation/common_voice_20_0 config: hi split: test metrics:
  - type: wer value: 38.6549 name: WER
- task: type: automatic-speech-recognition name: Automatic Speech Recognition dataset: name: Indic-Voices type: Indic-Voices config: hi split: test metrics:
  - type: wer value: 65.2147 name: WER widget:
src: audios/f89b6428-c58a-4355-ad63-0752b69f2d30.wav output: text: vah bas din mein kitni baar chalti hai?
src: audios/09cf2547-9d09-4914-926a-cf2043549c15.wav output: text: >- Salmaan ki image se prabhaavit hote hain is company ke share bhaav jaane kaise?
src: audios/6f7df89f-91a7-4cbd-be43-af7bce71a34b.wav output: text: vah roya aur aur roya.
src: audios/969bede5-d816-461b-9bf2-bd115e098439.wav output: text: helmet na pahnne se bhaarat mein har gante hoti hai chaar logon ki maut.
src: audios/cef43941-72c9-4d28-88dd-cb62808dc056.wav output: text: usne mujhe chithi ka javaab na dene ke lie daanta.
src: audios/b27d49fe-fced-4a17-9887-7bfbc5d4a899.wav output: text: puraana shahar divaaron se ghera hua hai.
src: audios/common_voice_hi_23796065.mp3 example_title: Speech Example 1
src: audios/common_voice_hi_41666099.mp3 example_title: Speech Example 2
src: audios/common_voice_hi_41429198.mp3 example_title: Speech Example 3
src: audios/common_voice_hi_41429259.mp3 example_title: Speech Example 4
src: audios/common_voice_hi_40904697.mp3 example_title: Speech Example 5 pipeline_tag: automatic-speech-recognition license: apache-2.0 metrics:
wer base_model:
openai/whisper-base library_name: transformers