Xlsr Timit B0
A phoneme transcription model fine-tuned on the TIMIT dataset, capable of transcribing English audio into phoneme representations
Downloads 40
Release Time : 11/30/2024
Model Overview
This model is based on the pre-trained checkpoint ginic/data_seed_4_wav2vec2-large-xlsr-buckeye-ipa and fine-tuned using the DARPA TIMIT English corpus. It can transcribe English audio into phoneme representations and outperforms all current XLSR models in English phonetic transcription tasks.
Model Features
High-precision phoneme transcription
Achieves an average character error rate (CER) of 0.113 on the TIMIT test set
English optimization
Specifically optimized for English speech with high phoneme transcription accuracy
Based on XLSR architecture
Built on the powerful wav2vec2-large-xlsr architecture with excellent speech feature extraction capabilities
Model Capabilities
English speech recognition
Phoneme transcription
Automatic speech transcription
Use Cases
Phonetics research
Phoneme analysis
Used for phoneme feature analysis in phonetics research
Provides accurate phoneme transcription results
Speech technology development
Speech recognition system development
Serves as a phoneme transcription component for speech recognition systems
Improves system accuracy in recognizing English phonemes
Featured Recommended AI Models