Whisper Finetune Teochew
A Teochew (Chaoshan) orthographic recognition model fine-tuned based on Whisper-medium, supporting multi-dialect accent orthographic transcription
Downloads 20
Release Time : 3/17/2025
Model Overview
This model is specifically designed for automatic speech recognition of Teochew (Chaoshan) dialect, using an innovative Dai Kan orthography annotation to avoid homophone ambiguity issues.
Model Features
Multi-dialect support
Covers various accents including Teochew prefectural city, Shantou urban area, southern Chao'an, Chenghai, and Rongjiang pronunciations
Dai Kan orthography
Employs an innovative annotation scheme to resolve homophone ambiguity (e.g., using ใไปใ instead of easily confused ใไธชใ)
Field recording data
Trained on 18.9 hours of real-world recordings containing 12,500 annotated samples
Model Capabilities
Teochew speech-to-text
Multi-accent recognition
Orthographic transcription
Use Cases
Dialect preservation
Teochew speech archiving
Converting orally transmitted Teochew recordings into standardized written records
CER 12.254% (test set)
Voice interaction
Dialect voice assistant
Supporting Teochew voice input for smart device interaction
Featured Recommended AI Models
ยฉ 2025AIbase