wav2vec2-large-lv60_phoneme-timit_english_timit-4k_002 Open Source Model - Accurately Achieve English Phoneme Recognition

Wav2vec2 Large Lv60 Phoneme Timit English Timit 4k 002

Developed by excalibur12

A fine-tuned English phoneme recognition model based on facebook/wav2vec2-large-lv60 on the TIMIT dataset, achieving a phoneme error rate of 10.53%

Speech Recognition

Transformers

EnglishOpen Source License:Apache-2.0 #Phoneme Recognition #TIMIT Dataset #Low PER

Downloads 103

Release Time : 6/17/2024

Model Overview

This model is specifically designed for English phoneme recognition tasks, trained on the TIMIT phoneme set, suitable for speech processing and analysis applications.

Model Features

High-Accuracy Phoneme Recognition

Achieves a phoneme error rate of 10.53% on the TIMIT test set, demonstrating excellent performance.

Comprehensive Phoneme Coverage

Supports the complete TIMIT phoneme set, including vowels, stops, affricates, fricatives, nasals, and approximants/glides.

Optimized Training Process

Utilizes linear learning rate scheduling and native AMP mixed-precision training for high training efficiency.

Model Capabilities

English Phoneme Recognition

Speech Feature Analysis

Phoneme Classification

Use Cases

Speech Processing

Speech Recognition Preprocessing

Serves as a front-end processing module for speech recognition systems, providing phoneme-level analysis results.

Phoneme error rate of 10.53%

Pronunciation Assessment

Used for evaluating pronunciation accuracy in language learning applications.

Academic Research

Phonetic Analysis

Supports the identification and classification of various phonemes in phonetic research.

Training Loss	Epoch	Step	Validation Loss	PER
7.9352	1.04	300	3.7710	0.9617
2.7874	2.08	600	0.9080	0.1929
0.8205	3.11	900	0.4670	0.1492
0.5504	4.15	1200	0.4025	0.1408
0.4632	5.19	1500	0.3696	0.1374
0.4148	6.23	1800	0.3519	0.1343
0.3873	7.27	2100	0.3419	0.1329
0.3695	8.3	2400	0.3368	0.1317
0.3531	9.34	2700	0.3406	0.1320
0.3507	10.38	3000	0.3354	0.1315

Property	Details
Model Type	wav2vec2-large-lv60_phoneme-timit_english_timit-4k
Training Data	TIMIT train dataset (4620 samples)
Evaluation Data	TIMIT test dataset (1680 samples)
License	Apache - 2.0

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Wav2vec2 Large Lv60 Phoneme Timit English Timit 4k 002

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 wav2vec2-large-lv60_phoneme-timit_english_timit-4k

🚀 Quick Start

✨ Features

Intended uses & limitations

Phoneme - wise errors

Vowel Phonemes

Stop Phonemes

Affricate Phonemes

Fricative Phonemes

Nasal Phonemes

Semivowels/Glide Phonemes

📚 Documentation

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

📄 License