U

Unispeech 1350 En 17h Ky Ft 1h

Developed by microsoft
A speech recognition model based on Microsoft's UniSpeech architecture, specifically fine-tuned for the Kyrgyz language
Downloads 39
Release Time : 3/2/2022

Model Overview

This model is a large-scale pre-trained model based on 16kHz sampled speech audio and phoneme labels, fine-tuned with 1 hour of Kyrgyz phoneme data. Primarily used for automatic speech recognition tasks in Kyrgyz.

Model Features

Multitask Learning
Combines supervised phoneme CTC learning and phoneme-aware contrastive self-supervised learning
Cross-lingual Generalization
Enhances cross-lingual and cross-domain generalization through unified pre-training methods
Efficient Fine-tuning
Requires only 1 hour of Kyrgyz phoneme data for fine-tuning

Model Capabilities

Kyrgyz speech recognition
Phoneme sequence prediction
Cross-lingual speech representation learning

Use Cases

Speech Recognition
Kyrgyz Speech-to-Text
Convert Kyrgyz speech into phoneme sequences or text
Compared to self-supervised pre-training and supervised transfer learning, it can reduce relative phoneme error rates by up to 13.4% and 17.8% respectively
Speech Technology Research
Cross-lingual Speech Representation Research
Used to study the cross-lingual transfer capabilities of speech representations
Achieves a 6% relative reduction in word error rate for domain-transfer speech recognition tasks
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase