Phi 4 Multimodal Instruct Ko Asr
A Korean automatic speech recognition (ASR) and speech translation (AST) model fine-tuned based on microsoft/Phi-4-multimodal-instruct, demonstrating excellent performance on the zeroth-korean and fleurs datasets.
Downloads 354
Release Time : 3/5/2025
Model Overview
This model focuses on Korean speech recognition and translation tasks, improving recognition accuracy and translation quality in Korean environments through fine-tuning.
Model Features
High-performance Korean recognition
Achieves a character error rate (CER) of 1.316 and a word error rate (WER) of 2.951 on the zeroth-korean test set.
Multi-task support
Supports both automatic speech recognition (ASR) and speech translation (AST) tasks simultaneously.
Optimized training
Utilized H100 GPU for 960 steps of targeted training, significantly enhancing Korean language processing capabilities.
Model Capabilities
Korean speech recognition
Korean-English speech translation
English-Korean speech translation
Use Cases
Speech transcription
Korean meeting minutes
Real-time transcription of Korean meeting recordings into text
Achieves a character error rate of only 1.316% on the zeroth test set.
Speech translation
Korean-English real-time translation
Real-time translation of Korean speech into English text
Achieves a BLEU score of 67.659 on the fleurs Korean test set.
Featured Recommended AI Models
Š 2025AIbase