Xtreme_s_xlsr_300m_voxpopuli_en Open-source Speech Recognition Model - Supports Accurate English Speech-to-Text Conversion

Xtreme S Xlsr 300m Voxpopuli En

Developed by anton-l

This model is a fine-tuned speech recognition model based on facebook/wav2vec2-xls-r-300m on the GOOGLE/XTREME_S - VOXPOPULI.EN dataset, supporting English speech-to-text tasks.

Speech Recognition

Transformers

EnglishOpen Source License:Apache-2.0 #English Speech Recognition #Low Word Error Rate #Multi-GPU Training

Downloads 28

Release Time : 4/29/2022

Model Overview

This is a model optimized for English speech recognition tasks, fine-tuned on the VOXPOPULI.EN dataset, capable of converting English speech into text.

Model Features

Efficient Speech Recognition

Fine-tuned on the VOXPOPULI.EN dataset, optimized for English speech recognition

Based on wav2vec2-xls-r Architecture

Uses facebook's wav2vec2-xls-r-300m pre-trained model as the foundation

Multi-GPU Training Optimization

Supports distributed multi-GPU training to improve training efficiency

Model Capabilities

English Speech Recognition

Speech-to-Text

Automatic Speech Recognition

Use Cases

Speech Transcription

Automatic Meeting Transcription

Automatically converts English meeting recordings into text transcripts

Character Error Rate (CER): 0.0966, Word Error Rate (WER): 0.1549

Podcast Content Transcription

Automatically converts English podcast content into text transcripts

Assistive Technology

Real-time Caption Generation

Generates real-time captions for English video content

🚀 xtreme_s_xlsr_300m_voxpopuli_en

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the GOOGLE/XTREME_S - VOXPOPULI.EN dataset, achieving certain evaluation results.

🚀 Quick Start

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the GOOGLE/XTREME_S - VOXPOPULI.EN dataset. It achieves the following results on the evaluation set:

Cer: 0.0966
Loss: 0.3127
Wer: 0.1549
Predict Samples: 1842

📚 Documentation

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

🔧 Technical Details

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 8
eval_batch_size: 1
seed: 42
distributed_type: multi - GPU
num_devices: 8
total_train_batch_size: 64
total_eval_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 2000
num_epochs: 10.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer	Cer
1.4221	0.19	500	1.3325	0.8224	0.3432
0.8429	0.38	1000	0.7087	0.5028	0.2023
0.7377	0.57	1500	0.4900	0.2778	0.1339
0.5641	0.77	2000	0.4460	0.2540	0.1284
0.5787	0.96	2500	0.4242	0.2148	0.1167
0.3465	1.15	3000	0.4210	0.2087	0.1154
0.2787	1.34	3500	0.3954	0.2090	0.1155
0.2775	1.53	4000	0.3938	0.1992	0.1133
0.262	1.72	4500	0.3748	0.2104	0.1151
0.3138	1.92	5000	0.3825	0.1993	0.1134
0.4331	2.11	5500	0.3648	0.1935	0.1104
0.3802	2.3	6000	0.3966	0.1910	0.1109
0.3928	2.49	6500	0.3995	0.1898	0.1100
0.3441	2.68	7000	0.3764	0.1887	0.1103
0.3673	2.87	7500	0.3800	0.1843	0.1086
0.3422	3.07	8000	0.3932	0.1830	0.1092
0.2933	3.26	8500	0.3672	0.1915	0.1104
0.1785	3.45	9000	0.3820	0.1796	0.1072
0.321	3.64	9500	0.3533	0.1994	0.1126
0.1673	3.83	10000	0.3683	0.1856	0.1084
0.1757	4.02	10500	0.3365	0.1925	0.1102
0.1881	4.22	11000	0.3528	0.1775	0.1066
0.3106	4.41	11500	0.3909	0.1754	0.1063
0.25	4.6	12000	0.3734	0.1723	0.1052
0.2005	4.79	12500	0.3358	0.1900	0.1092
0.2982	4.98	13000	0.3513	0.1766	0.1060
0.1552	5.17	13500	0.3720	0.1729	0.1059
0.1645	5.37	14000	0.3569	0.1713	0.1044
0.2065	5.56	14500	0.3639	0.1720	0.1048
0.1898	5.75	15000	0.3660	0.1726	0.1050
0.1397	5.94	15500	0.3731	0.1670	0.1033
0.2056	6.13	16000	0.3782	0.1650	0.1030
0.1859	6.32	16500	0.3903	0.1667	0.1033
0.1374	6.52	17000	0.3721	0.1736	0.1048
0.2482	6.71	17500	0.3899	0.1643	0.1023
0.159	6.9	18000	0.3847	0.1687	0.1032
0.1487	7.09	18500	0.3817	0.1671	0.1030
0.1942	7.28	19000	0.4120	0.1616	0.1018
0.1517	7.47	19500	0.3856	0.1635	0.1020
0.0946	7.67	20000	0.3838	0.1621	0.1016
0.1455	7.86	20500	0.3749	0.1652	0.1020
0.1303	8.05	21000	0.4074	0.1615	0.1011
0.1207	8.24	21500	0.4121	0.1606	0.1008
0.0727	8.43	22000	0.3948	0.1607	0.1009
0.1123	8.62	22500	0.4025	0.1603	0.1009
0.1606	8.82	23000	0.3963	0.1580	0.1004
0.1458	9.01	23500	0.3991	0.1574	0.1002
0.2286	9.2	24000	0.4149	0.1596	0.1009
0.1284	9.39	24500	0.4251	0.1572	0.1002
0.1141	9.58	25000	0.4264	0.1579	0.1002
0.1823	9.77	25500	0.4230	0.1562	0.0999
0.2514	9.97	26000	0.4242	0.1564	0.0999

Framework versions

Transformers 4.18.0.dev0
Pytorch 1.10.1+cu111
Datasets 1.18.4.dev0
Tokenizers 0.11.6

📄 License

This model is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご