Whisper-Large-v3-turbo-STT-Zeroth-KO-v2 Open Source Model - High-Accuracy Korean Speech Transcription with Timestamps

Whisper Large V3 Turbo STT Zeroth KO V2

Developed by o0dimplz0o

A Korean automatic speech recognition model optimized based on Whisper Large v3 Turbo, providing high-accuracy transcription with timestamps

Speech Recognition

Transformers

Korean#Korean speech transcription #High-precision timestamps #Incremental fine-tuning

Downloads 662

Release Time : 2/3/2025

Model Overview

This model is an optimized version of openai/whisper-large-v3-turbo, specifically fine-tuned for Korean automatic speech recognition (ASR) tasks, aiming to provide high-accuracy speech transcription.

Model Features

Korean Optimization

Specially fine-tuned for Korean speech recognition, providing higher transcription accuracy

Timestamp Support

Transcription results include timestamp information for easy audio content localization

Incremental Fine-tuning

Adopts a phased incremental fine-tuning strategy to continuously optimize model performance

Data Augmentation

Applies 20% random data augmentation during training to improve model robustness

Model Capabilities

Korean speech recognition

Timestamped transcription

High-accuracy speech-to-text

Use Cases

Speech Transcription

Korean Meeting Minutes

Automatically transcribe Korean meeting recordings into timestamped text

Word error rate 19.9134%, character error rate 0.0660%

Korean Media Subtitle Generation

Automatically generate subtitles for Korean video content

Speech Analysis

Korean Speech Content Analysis

Analyze Korean speech content to extract key information

🚀 Fine-Tuned-Whisper-Large-v3-Turbo-STT-Zeroth-KO-v2

This is a fine-tuned model based on openai/whisper-large-v3-turbo, specifically optimized for Korean automatic speech recognition tasks, aiming to provide high - accuracy and timestamped transcriptions.

🚀 Quick Start

This section provides a brief overview of the model and its current performance.

✨ Features

Based on openai/whisper-large-v3-turbo, incrementally fine - tuned for Korean ASR.
Aims to achieve high accuracy and timestamped transcriptions for Korean speech.

📦 Installation

No installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

No code examples are provided in the original document, so this section is skipped.

📚 Documentation

Model Information

Property	Details
Library Name	transformers
Metrics	wer, cer
Model Name	Fine-Tuned-Whisper-Large-v3-Turbo-STT-Zeroth-KO-v2
Base Model	openai/whisper-large-v3-turbo
Pipeline Tag	automatic-speech-recognition
Language	ko

Evaluation Results

This model is being fine - tuned from openai/whisper-large-v3-turbo on a custom dataset. It currently achieves the following results on the evaluation set (still fine - tuning):

Loss: 0.0164
Wer: 19.9134
Cer: 0.0660

Model Description

This model is a version of openai/whisper-large-v3-turbo, currently still being incrementally fine - tuned in stages, specifically optimized for Korean automatic speech recognition (ASR) tasks. The fine - tuning process aims to deliver high accuracy and timestamped transcriptions for Korean speech.

Dataset Details

Dataset Source: Custom dataset (https://huggingface.co/datasets/o0dimplz0o/Zeroth-STT-Korean)
Number of Samples: 102,263
Split: 93% train, 7% test
Data Augmentation: 20% random, applied only to the training set

Training Details

Hardware: L40S GPU
Learning Rate Scheduler: Cosine
Epochs: [pending completion]
Optimizer: AdamW Torch Fused

🔧 Technical Details

The document does not contain specific technical details over 50 words, so this section is skipped.

📄 License

No license information is provided in the original document, so this section is skipped.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご