whisper-small-ko-low-qual-voice Open-source Korean Speech Recognition Model - Accurately recognize Korean speech content

Whisper Small Ko Low Qual Voice

Developed by kimthegarden

A Korean automatic speech recognition model fine-tuned based on the Whisper-small architecture, which performs excellently in Korean speech recognition tasks.

Speech Recognition

Safetensors

KoreanOpen Source License:MIT #Korean speech recognition #High-precision transcription #Offline batch processing

Downloads 211

Release Time : 7/2/2025

Model Overview

This model is a Korean automatic speech recognition model fine-tuned based on the Whisper-small architecture, suitable for various Korean speech processing scenarios, such as conversations, broadcasts, news, etc.

Model Features

Accurate recognition

Performs excellently in Korean speech recognition tasks and can accurately transcribe Korean speech content.

Suitable for multiple scenarios

Can be used for offline or batch transcription of Korean speech data and can also be integrated into Korean speech assistant systems.

Highly scalable

Supports further fine-tuning on specific domain datasets, such as law, medicine, education, etc.

Model Capabilities

Korean speech recognition

Speech transcription

Speech assistant integration

Use Cases

Speech transcription

Offline speech transcription

Used for batch transcription of Korean speech data.

Speech assistant integration

Integrated into Korean speech assistant systems.

Domain-specific applications

Legal domain

Further fine-tuned on legal domain datasets for legal speech transcription.

Medical domain

Further fine-tuned on medical domain datasets for medical speech transcription.

🚀 Model Card for whisper-small-ko-finetuned

This is a fine - tuned model based on SungBeom/whisper - small - ko, specialized for Korean automatic speech recognition, delivering excellent performance on validation sets.

🚀 Quick Start

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch

model = WhisperForConditionalGeneration.from_pretrained("your-username/whisper-small-ko-finetuned")
processor = WhisperProcessor.from_pretrained("your-username/whisper-small-ko-finetuned")

# Input: 16kHz waveform (float32 numpy or tensor)
inputs = processor(audio_waveform, sampling_rate=16000, return_tensors="pt")

with torch.no_grad():
    predicted_ids = model.generate(inputs.input_features)

transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print(transcription[0])

✨ Features

Performs automatic speech recognition (ASR) for Korean audio data.
Can be used for offline or batch transcription of Korean speech data.
Suitable for integration into Korean - language voice assistant systems.
Allows for further fine - tuning on domain - specific datasets.

📚 Documentation

Model Details

Model Description

This model is based on the Whisper - small architecture and fine - tuned on 62,327 Korean audio - transcript pairs using Hugging Face Transformers and PyTorch. It is designed for general - domain Korean speech recognition (conversational, broadcast, news, etc.).

Property	Details
Developed by	[Jeongwon Kim]
Shared by	[kimthegarden]
Model Type	Encoder - decoder Transformer (WhisperForConditionalGeneration)
Language(s)	Korean (`ko`)
License	MIT
Fine - tuned from model	`SungBeom/whisper-small-ko`

Model Sources

Repository: [https://huggingface.co/kimthegarden/whisper-small-ko-low-qual-voice]
Notebook: Fine - tuned using a custom whisper_finetuning.ipynb
Demo [optional]: [Gradio or Streamlit demo link if available]

Uses

Direct Use

Korean automatic speech recognition (ASR)
Offline or batch transcription of Korean speech data
Integration into Korean - language voice assistant systems

Downstream Use

Further fine - tuning on domain - specific datasets (e.g. legal, medical, education)
Research into Korean ASR model robustness or multilingual Whisper models

Out - of - Scope Use

Transcription of non - Korean speech (this model is Korean - only)
Real - time streaming ASR (not latency - optimized)
Zero - shot or few - shot adaptation to other languages

Bias, Risks, and Limitations

The model may show reduced accuracy on:
- Regional dialects or accents not represented in the training data
- Very noisy environments
- Children’s speech or non - native pronunciation
The model has not been tested for fairness across different speakers (gender, age, etc.)

⚠️ Important Note

The model may have reduced accuracy in certain scenarios such as on regional dialects, in noisy environments, or for children's speech. Also, fairness across different speakers has not been tested.

💡 Usage Tip

We recommend testing the model on your specific data domain before deployment. Additional fine - tuning or data filtering may be required for sensitive use cases (e.g. education, healthcare).