whisper-small-uz-en-ru-lang-id Open-Source Multilingual Speech Model - Supports Uzbek, English and Russian Speech Recognition and Classification

Whisper Small Uz En Ru Lang Id

Developed by fitlemon

A fine-tuned multilingual speech classification model based on Whisper-small, supporting speech recognition and classification for Uzbek, English, and Russian.

Audio Classification

Transformers

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Multilingual speech recognition #High-accuracy classification #Uzbek-English-Russian support

Downloads 17

Release Time : 3/7/2024

Model Overview

This model is a fine-tuned speech classification model based on openai/whisper-small, specifically optimized for Uzbek, English, and Russian, primarily used for recognizing and classifying speech content in these three languages.

Model Features

Multilingual support

Capable of recognizing and classifying speech content in Uzbek, English, and Russian.

High accuracy

Achieves 97.47% accuracy and 97.46% F1 score on the validation set.

Based on Whisper architecture

Utilizes the proven Whisper-small architecture for fine-tuning, ensuring model stability and performance.

Model Capabilities

Speech recognition

Language classification

Multilingual processing

Use Cases

Speech recognition

Multilingual speech classification

Identify whether speech content is in Uzbek, English, or Russian.

Test set accuracy reaches 92.4%.

🚀 whisper-small-uz-en-ru-lang-id

This model is a fine - tuned version of openai/whisper-small on the "mozilla - foundation/common_voice_16_1"(uz/en/ru) dataset. It can accurately classify audio languages among Uzbek, English, and Russian, providing high - precision language identification capabilities.

🚀 Quick Start

This model is a fine - tuned version of openai/whisper-small on the "mozilla - foundation/common_voice_16_1"(uz/en/ru) dataset. It achieves the following results on the validation set during training:

Loss: 0.2065
Accuracy: 0.9747
F1: 0.9746

Accuracy on the test (evaluation) dataset: 92.4%.

📚 Documentation

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

# datasets for each language from the set {uz: Uzbek, en: English, ru: Russian}
common_voice_train_uz = load_dataset("mozilla-foundation/common_voice_16_1", "uz", split='train', trust_remote_code=True, token=env('HUGGING_TOKEN'), streaming=True)
common_voice_train_ru = load_dataset("mozilla-foundation/common_voice_16_1", "ru", split='train', trust_remote_code=True, token=env('HUGGING_TOKEN'), streaming=True)
common_voice_train_en = load_dataset("mozilla-foundation/common_voice_16_1", "en", split='train', trust_remote_code=True, token=env('HUGGING_TOKEN'), streaming=True)
common_voice_valid_uz = load_dataset("mozilla-foundation/common_voice_16_1", "uz", split='validation', trust_remote_code=True, token=env('HUGGING_TOKEN'), streaming=True)
common_voice_valid_ru = load_dataset("mozilla-foundation/common_voice_16_1", "ru", split='validation', trust_remote_code=True, token=env('HUGGING_TOKEN'), streaming=True)
common_voice_valid_en = load_dataset("mozilla-foundation/common_voice_16_1", "en", split='validation', trust_remote_code=True, token=env('HUGGING_TOKEN'), streaming=True)

# code to shuffle and to take limited size of data. Rows per set: Train-24000, Validation-3000.
... 
# concatenate 3 datasets
common_voice['train'] = concatenate_datasets([common_voice_train_uz, common_voice_train_ru, common_voice_train_en])

Training procedure

Used Trainer from transformers. Training and evaluation process are described in the Jupyter notebook, storing in the following github repository:

https://github.com/fitlemon/whisper-small-uz-en-ru-lang-id

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 3e - 05
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
training_steps: 9000
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	F1
0.0252	1	3000	0.3089	0.953	0.9525
0.0357	2	6000	0.1732	0.964	0.9637
0.0	3	9000	0.2065	0.9747	0.9746

Framework versions

Transformers 4.38.2
Pytorch 2.2.1+cu121
Datasets 2.17.1
Tokenizers 0.15.2

📄 License

This model is licensed under the Apache - 2.0 license.

Property	Details
Model Type	Fine - tuned version of openai/whisper - small
Training Data	mozilla - foundation/common_voice_16_1 (uz/en/ru)
License	Apache - 2.0

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご