P

Phi 4 Multimodal Instruct Ko Asr

Developed by junnei
A Korean automatic speech recognition (ASR) and speech translation (AST) model fine-tuned based on microsoft/Phi-4-multimodal-instruct, demonstrating excellent performance on the zeroth-korean and fleurs datasets.
Downloads 354
Release Time : 3/5/2025

Model Overview

This model focuses on Korean speech recognition and translation tasks, improving recognition accuracy and translation quality in Korean environments through fine-tuning.

Model Features

High-performance Korean recognition
Achieves a character error rate (CER) of 1.316 and a word error rate (WER) of 2.951 on the zeroth-korean test set.
Multi-task support
Supports both automatic speech recognition (ASR) and speech translation (AST) tasks simultaneously.
Optimized training
Utilized H100 GPU for 960 steps of targeted training, significantly enhancing Korean language processing capabilities.

Model Capabilities

Korean speech recognition
Korean-English speech translation
English-Korean speech translation

Use Cases

Speech transcription
Korean meeting minutes
Real-time transcription of Korean meeting recordings into text
Achieves a character error rate of only 1.316% on the zeroth test set.
Speech translation
Korean-English real-time translation
Real-time translation of Korean speech into English text
Achieves a BLEU score of 67.659 on the fleurs Korean test set.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase