P

Phi 4 Mm Inst Asr Singlish

Developed by mjwong
A multimodal speech recognition model optimized for Singapore English, fine-tuned based on Microsoft's Phi-4 multimodal instruction model, significantly improving recognition of Singapore English's unique phonetic features.
Downloads 61
Release Time : 5/1/2025

Model Overview

This model addresses the insufficient representation of regional dialects in general large language models, specifically optimizing for code-switching and unique prosody in Singapore English (Singlish), achieving the unified vision of a 'listen-understand-respond naturally' model.

Model Features

Singapore English Optimization
Specifically optimized for code-switching and unique prosody in Singapore English, significantly improving recognition accuracy.
Multimodal Capabilities
Based on Phi-4 multimodal instruction model, capable of processing both audio and text modalities.
Efficient Fine-Tuning
Only unfreezes audio-related modules during training, efficiently adapting to Singapore English while maintaining core language understanding capabilities.
Smart Termination
Through end token training, the model accurately determines transcription endpoints, avoiding redundant outputs.

Model Capabilities

Singapore English Speech Recognition
Multimodal Understanding
Speech Transcription
Voice-First Agent Development

Use Cases

Speech Transcription
Singapore English Conversation Transcription
Transcribes daily conversations featuring Singapore English characteristics into text
Word Error Rate (WER) as low as 13.16%
Smart Assistants
Singapore English Voice Assistant
Develops voice-first assistants capable of understanding Singapore English accents
Achieves unified 'listen-understand-respond naturally' model
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase