Whisper-base.kk Open-Source Automatic Speech Recognition Model - Freely Deploy and Accurately Recognize Kazakh Speech

Whisper Base.kk

Developed by akuzdeuov

Whisper-base is an automatic speech recognition (ASR) model optimized for low-resource Kazakh language, fine-tuned on the Kazakh Speech Corpus 2 with over 1,000 hours of annotated data.

Speech Recognition

Safetensors

OtherOpen Source License:Apache-2.0 #Kazakh speech recognition #Low-resource optimization #Industrial-grade corpus

Downloads 43

Release Time : 8/14/2024

Model Overview

This is a speech recognition model supporting only the Kazakh language, based on the Whisper architecture, specifically optimized for Kazakh speech-to-text tasks.

Model Features

Low-resource language optimization

Specifically optimized for low-resource languages like Kazakh, achieving good performance with limited data.

Industrial-grade corpus training

Trained using over 1,000 hours of industrial-grade Kazakh speech corpus (KSC2).

Long audio processing

Supports processing of arbitrarily long audio inputs through chunking algorithms.

Model Capabilities

Kazakh speech recognition

Long audio transcription

Batch speech processing

Use Cases

Speech transcription

Kazakh meeting minutes

Automatically transcribe Kazakh meeting recordings into text records.

Test set WER 15.36%

Media content subtitle generation

Automatically generate subtitles for Kazakh video content.

🚀 Whisper

Whisper-base for automatic speech recognition (ASR) tailored for the low - resourced Kazakh language. This model offers an effective solution for transcribing Kazakh speech, leveraging extensive training data.

🚀 Quick Start

This model is a fine - tuned Whisper - base for automatic speech recognition of the Kazakh language. It was trained on the Kazakh Speech Corpus 2 with more than 1k hours of labeled data and achieved a 15.36% WER on the test set.

✨ Features

Language - Specific: It is a Kazakh - only model, dedicated to high - quality Kazakh speech recognition.
Long - Form Transcription: Through a chunking algorithm, it can transcribe audio of arbitrary length.

📦 Installation

No specific installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

>>> from transformers import WhisperProcessor, WhisperForConditionalGeneration
>>> import librosa

>>> # load model and processor
>>> processor = WhisperProcessor.from_pretrained("akuzdeuov/whisper-base.kk")
>>> model = WhisperForConditionalGeneration.from_pretrained("akuzdeuov/whisper-base.kk")

>>> # load your audio
>>> audio, sampling_rate = librosa.load("path_to_audio", sr=16000)
>>> input_features = processor(audio, sampling_rate=sampling_rate, return_tensors="pt").input_features 

>>> # generate token ids
>>> predicted_ids = model.generate(input_features)
>>> # decode token ids to text
>>> transcription = processor.batch_decode(predicted_ids, skip_special_tokens=False)

>>> transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)

The context tokens can be removed from the start of the transcription by setting skip_special_tokens=True.

Advanced Usage

>>> import torch
>>> from transformers import pipeline

>>> device = "cuda:0" if torch.cuda.is_available() else "cpu"

>>> pipe = pipeline(
>>>   "automatic-speech-recognition",
>>>   model="akuzdeuov/whisper-base.kk",
>>>   chunk_length_s=30,
>>>   device=device,
>>> )

>>> prediction = pipe("path_to_audio", batch_size=8)["text"]

The Whisper model is intrinsically designed to work on audio samples of up to 30s in duration. However, by using a chunking algorithm, it can be used to transcribe audio samples of up to arbitrary length. This is possible through Transformers pipeline method. Chunking is enabled by setting chunk_length_s = 30 when instantiating the pipeline. With chunking enabled, the pipeline can be run with batched inference.

📚 Documentation

Property	Details
Model Type	whisper - base.kk
Training Data	Kazakh Speech Corpus 2 with over 1k hours of labelled data
Task	Automatic Speech Recognition
Dataset	Kazakh Speech Corpus 2 (KSC2), type: librispeech_asr, config: clean, split: test, language: kk
Metrics	Test WER: 15.36%

🔧 Technical Details

No detailed technical implementation details are provided in the original document, so this section is skipped.

📄 License

This project is licensed under the apache - 2.0 license.

📚 References

Whisper, OpenAI.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご