The xtreme_s_xlsr_300m_fleurs_langid open-source model - Freely support multi-language speech recognition tasks

Xtreme S Xlsr 300m Fleurs Langid

Developed by anton-l

This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on the GOOGLE/XTREME_S - FLEURS.ALL dataset for multilingual speech recognition tasks.

Audio Classification

Transformers

OtherOpen Source License:Apache-2.0 #Multilingual speech recognition #High-accuracy language classification #Low-resource language support

Downloads 17

Release Time : 4/6/2022

Model Overview

This is a multilingual speech recognition model based on the wav2vec2-xls-r-300m architecture, fine-tuned on the FLEURS.ALL dataset, supporting speech recognition tasks in multiple languages.

Model Features

Multilingual support

Supports speech recognition in 102 languages, including major and some niche languages

High accuracy

Achieves high recognition accuracy in multiple languages, such as Arabic (99.77%), Bengali (99.89%), etc.

Based on XLS-R architecture

Utilizes facebook's wav2vec2-xls-r-300m architecture with powerful speech feature extraction capabilities

Model Capabilities

Speech recognition

Multilingual processing

Language identification

Speech-to-text

Use Cases

Speech transcription

Multilingual meeting minutes

Used for real-time speech transcription in multilingual meetings

Supports accurate transcription in multiple languages

Voice assistants

Used as the speech recognition module for multilingual voice assistants

Can recognize user commands in multiple languages

Language learning

Pronunciation assessment

Used for pronunciation evaluation in language learning applications

Can assess pronunciation accuracy in multiple languages

🚀 xtreme_s_xlsr_300m_fleurs_langid

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the GOOGLE/XTREME_S - FLEURS.ALL dataset. It offers language identification capabilities with specific performance metrics on various languages.

🚀 Quick Start

This section provides a high - level overview of the model. The model xtreme_s_xlsr_300m_fleurs_langid is fine - tuned on the GOOGLE/XTREME_S - FLEURS.ALL dataset based on the pre - trained model facebook/wav2vec2-xls-r-300m.

📚 Documentation

Evaluation Results

The model achieves the following results on the evaluation set:

Language	Accuracy	Loss
General	0.7271	1.3789
Af Za	0.3865	2.6778
Am Et	0.8818	0.4615
Ar Eg	0.9977	0.0149
As In	0.9858	0.0764
Ast Es	0.8362	0.4560
Az Az	0.8386	0.5677
Be By	0.4085	1.9231
Bn In	0.9989	0.0024
Bs Ba	0.2508	2.4954
Ca Es	0.6947	1.2632
Ceb Ph	0.9852	0.0426
Cmn Hans Cn	0.9799	0.0650
Cs Cz	0.5353	1.9334
Cy Gb	0.9716	0.1274
Da Dk	0.6688	1.4990
De De	0.7807	0.8820
El Gr	0.7692	0.9839
En Us	0.9815	0.0827
Es 419	0.9846	0.0516
Et Ee	0.5230	1.9264
Fa Ir	0.8462	0.6520
Ff Sn	0.2348	5.4283
Fi Fi	0.9978	0.0109
Fil Ph	0.9564	0.1706
Fr Fr	0.9852	0.0591
Ga Ie	0.8468	0.5174
Gl Es	0.5016	1.2657
Gu In	0.973	0.0850
Ha Ng	0.9163	0.3234
He Il	0.8043	0.8299
Hi In	0.9354	0.4190
Hr Hr	0.3654	2.9754
Hu Hu	0.8044	0.8345
Hy Am	0.9914	0.0329
Id Id	0.9869	0.0529
Ig Ng	0.9360	0.2523
Is Is	0.0217	6.5153
It It	0.8	0.8113
Ja Jp	0.7385	1.3968
Jv Id	0.5824	2.0009
Ka Ge	0.8611	0.6162
Kam Ke	0.4184	2.2192
Kea Cv	0.8692	0.5567
Kk Kz	0.8727	0.5592
Km Kh	0.7030	1.7358
Kn In	0.9630	0.1063
Ko Kr	0.9843	0.1519
Ku Arab Iq	0.9577	0.2075
Ky Kg	0.8936	0.4639
Lb Lu	0.8897	0.4454
Lg Ug	0.9253	0.3764
Ln Cd	0.9644	0.1844
Lo La	0.1580	3.8051
Lt Lt	0.4686	2.5054
Luo Ke	0.9922	0.0479
Lv Lv	0.6498	1.3713
Mi Nz	0.9613	0.1390
Mk Mk	0.7636	0.7952
Ml In	0.6962	1.2999
Mn Mn	0.8462	0.7621
Mr In	0.3911	3.7056
Ms My	0.3632	3.0192
Mt Mt	0.6188	1.5520
My Mm	0.9705	0.1514
Nb No	0.6891	1.1194
Ne Np	0.8994	0.4231
Nl Nl	0.9093	0.3291
Nso Za	0.8873	0.5106
Ny Mw	0.4691	2.7346
Oci Fr	0.1533	5.0983
Om Et	0.9512	0.2297
Or In	0.5447	2.5432
Pa In	0.8153	0.7753
Pl Pl	0.7757	0.7309
Ps Af	0.8105	1.0454
Pt Br	0.7715	0.9782
Ro Ro	0.4122	3.5829
Ru Ru	0.9794	0.0598
Rup Bg	0.9468	0.1695
Sd Arab In	0.5245	2.6198
Sk Sk	0.8624	0.5583
Sl Si	0.0300	6.0923
Sn Zw	0.8843	0.4465
So So	0.8803	0.4492
Sr Rs	0.0257	4.7575
Sv Se	0.0145	6.5858
Sw Ke	0.9199	0.4235
Ta In	0.9526	0.1818
Te In	0.9788	0.0808
Tg Tj	0.9883	0.0912
Th Th	0.9912	0.0462
Tr Tr	0.7887	0.7340
Uk Ua	0.0627	4.6777
Umb Ao	0.7863	1.4021
Ur Pk	0.0134	8.4067
Uz Uz	0.4014	4.3297
Vi Vn	0.7246	1.1304
Wo Sn	0.4555	2.2281
Xh Za	1.0	0.0009
Yo Ng	0.7353	1.3345
Yue Hant Hk	0.7985	1.0728
Zu Za	0.4696	3.7279
Predict Samples	-	77960

Training Procedure

Training Hyperparameters

The following hyperparameters were used during training:

Hyperparameter	Value
learning_rate	0.0003
train_batch_size	8
eval_batch_size	1
seed	42
distributed_type	multi - GPU
num_devices	8
total_train_batch_size	64
total_eval_batch_size	8
optimizer	Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type	linear
lr_scheduler_warmup_steps	2000
num_epochs	5.0
mixed_precision_training	Native AMP

Training Results

Training Loss	Epoch	Step	Accuracy	Validation Loss
0.5296	0.26	1000	0.4016	2.6633
0.4252	0.52	2000	0.5751	1.8582
0.2989	0.78	3000	0.6332	1.6780
0.3563	1.04	4000	0.6799	1.4479
0.1617	1.3	5000	0.6679	1.5066
0.1409	1.56	6000	0.6992	1.4082
0.01	1.82	7000	0.7071	1.2448
0.0018	2.08	8000	0.7148	1.1996
0.0014	2.34	9000	0.6410	1.6505
0.0188	2.6	10000	0.6840	1.4050
0.0007	2.86	11000	0.6621	1.5831
0.1038	3.12	12000	0.6829	1.5441
0.0003	3.38	13000	0.6900	1.3483
0.0004	3.64	14000	0.6414	1.7070
0.0003	3.9	15000	0.7075	1.3198
0.0002	4.16	16000	0.7105	1.3118
0.0001	4.42	17000	0.7029	1.4099
0.0	4.68	18000	0.7180	1.3658
0.0001	4.93	19000	0.7236	1.3514

Framework Versions

Transformers 4.18.0.dev0
Pytorch 1.10.1+cu111
Datasets 1.18.4.dev0
Tokenizers 0.11.6

📄 License

The model is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご