wav2vec2-xls-r-300m-italian-robust Open-source Model - Accurately Achieve Automatic Italian Speech Recognition

Wav2vec2 Xls R 300m Italian Robust

Developed by dbdmg

An automatic speech recognition model fine-tuned on multiple Italian speech datasets based on facebook/wav2vec2-xls-r-300m

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Italian Speech Recognition #Multi-dataset Fine-tuning #Low CER Performance

Downloads 28

Release Time : 3/2/2022

Model Overview

This is an automatic speech recognition (ASR) model for Italian, based on the XLS-R architecture, fine-tuned on public datasets such as Common Voice, and supports enhanced recognition with a language model.

Model Features

Multi-dataset Training

Fine-tuned on multilingual datasets such as Common Voice, LibriSpeech, and TED to improve model robustness

Language Model Enhancement

Supports recognition combined with a language model, reducing WER by approximately 30%

Cross-scenario Adaptation

Performs well on robust speech event datasets, adapting to different recording environments

Model Capabilities

Italian Speech-to-Text

Enhanced Recognition with Language Model

Multiple Accent Recognition

Use Cases

Speech Transcription

Meeting Minutes

Convert Italian meeting recordings into text transcripts

CER 3.52% (with language model)

Media Subtitle Generation

Automatically generate subtitles for Italian video content

Voice Interaction

Voice Assistant

Supports Italian voice command recognition

🚀 wav2vec2-xls-r-300m-italian-robust

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m for Italian speech recognition. It has been trained on multiple Italian datasets, offering high - quality automatic speech recognition capabilities.

✨ Features

Multilingual Datasets: Trained on a variety of Italian datasets, including Mozilla Foundation Common Voice V7, LibriSpeech multilingual, TED multilingual, Voxforge, M - AILABS Speech Dataset, EuroParl - ST, EMOVO, and MSPKA.
High - Performance Metrics: Achieved good results on multiple evaluation metrics such as Word Error Rate (WER) and Character Error Rate (CER) on different datasets.

📚 Documentation

Model Information

Property	Details
Model Type	Fine - tuned version of facebook/wav2vec2-xls-r-300m for Italian
Training Data	Mozilla Foundation Common Voice V7 dataset, LibriSpeech multilingual, TED multilingual, Voxforge, M - AILABS Speech Dataset, EuroParl - ST, EMOVO, MSPKA

Model Index

Name: XLS - R - 300m - Italian
Results:
- Task: Automatic Speech Recognition
- Datasets and Metrics:
  - Common Voice 7:
    - Test WER: 17.17
    - Test CER: 4.27
    - Test WER (+LM): 12.07
    - Test CER (+LM): 3.52
  - Robust Speech Event - Dev Data:
    - Test WER: 24.29
    - Test CER: 8.1
    - Test WER (+LM): 17.36
    - Test CER (+LM): 7.94
  - Robust Speech Event - Test Data:
    - Test WER: 33.66

Training Procedure

Training Hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 32
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 10.0
mixed_precision_training: Native AMP

Training Results

Training Loss	Epoch	Step	Validation Loss	Wer
No log	0.06	400	0.7508	0.7354
2.3127	0.11	800	0.5888	0.5882
0.7256	0.17	1200	0.5121	0.5247
0.6692	0.22	1600	0.4774	0.5028
0.6384	0.28	2000	0.4832	0.4885
0.6384	0.33	2400	0.4410	0.4581
0.6199	0.39	2800	0.4160	0.4331
0.5972	0.44	3200	0.4136	0.4275
0.6048	0.5	3600	0.4362	0.4538
0.5627	0.55	4000	0.4313	0.4469
0.5627	0.61	4400	0.4425	0.4579
0.5855	0.66	4800	0.3859	0.4133
0.5702	0.72	5200	0.3974	0.4097
0.55	0.77	5600	0.3931	0.4134
0.5624	0.83	6000	0.3900	0.4126
0.5624	0.88	6400	0.3622	0.3899
0.5615	0.94	6800	0.3755	0.4067
0.5472	0.99	7200	0.3980	0.4284
0.5663	1.05	7600	0.3553	0.3782
0.5189	1.1	8000	0.3538	0.3726
0.5189	1.16	8400	0.3425	0.3624
0.518	1.21	8800	0.3431	0.3651
0.5399	1.27	9200	0.3442	0.3573
0.5303	1.32	9600	0.3241	0.3404
0.5043	1.38	10000	0.3175	0.3378
0.5043	1.43	10400	0.3265	0.3501
0.4968	1.49	10800	0.3539	0.3703
0.5102	1.54	11200	0.3323	0.3506
0.5008	1.6	11600	0.3188	0.3433
0.4996	1.65	12000	0.3162	0.3388
0.4996	1.71	12400	0.3353	0.3552
0.5007	1.76	12800	0.3152	0.3317
0.4956	1.82	13200	0.3207	0.3430
0.5205	1.87	13600	0.3239	0.3430
0.4829	1.93	14000	0.3134	0.3266
0.4829	1.98	14400	0.3039	0.3291
0.5251	2.04	14800	0.2944	0.3169
0.4872	2.09	15200	0.3061	0.3228
0.4805	2.15	15600	0.3034	0.3152
0.4949	2.2	16000	0.2896	0.3066
0.4949	2.26	16400	0.3059	0.3344
0.468	2.31	16800	0.2932	0.3111
0.4637	2.37	17200	0.2890	0.3074
0.4638	2.42	17600	0.2893	0.3112
0.4728	2.48	18000	0.2832	0.3013
0.4728	2.54	18400	0.2921	0.3065
0.456	2.59	18800	0.2961	0.3104
0.4628	2.65	19200	0.2886	0.3109
0.4534	2.7	19600	0.2828	0.3020
0.4578	2.76	20000	0.2805	0.3026
0.4578	2.81	20400	0.2796	0.2987
0.4702	2.87	20800	0.2748	0.2906
0.4487	2.92	21200	0.2819	0.3008
0.4411	2.98	21600	0.2722	0.2868
0.4631	3.03	22000	0.2814	0.2974
0.4631	3.09	22400	0.2762	0.2894
0.4591	3.14	22800	0.2802	0.2980
0.4349	3.2	23200	0.2748	0.2951
0.4339	3.25	23600	0.2792	0.2927
0.4254	3.31	24000	0.2712	0.2911
0.4254	3.36	24400	0.2719	0.2892
0.4317	3.42	24800	0.2686	0.2861
0.4282	3.47	25200	0.2632	0.2861
0.4262	3.53	25600	0.2633	0.2817
0.4162	3.58	26000	0.2561	0.2765
0.4162	3.64	26400	0.2613	0.2847
0.414	3.69	26800	0.2679	0.2824
0.4132	3.75	27200	0.2569	0.2813
0.405	3.8	27600	0.2589	0.2785
0.4128	3.86	28000	0.2611	0.2714
0.4128	3.91	28400	0.2548	0.2731
0.4174	3.97	28800	0.2574	0.2716
0.421	4.02	29200	0.2529	0.2700
0.4109	4.08	29600	0.2547	0.2682
0.4027	4.13	30000	0.2578	0.2758
0.4027	4.19	30400	0.2511	0.2715
0.4075	4.24	30800	0.2507	0.2601
0.3947	4.3	31200	0.2552	0.2711
0.4042	4.35	31600	0.2530	0.2695
0.3907	4.41	32000	0.2543	0.2738
0.3907	4.46	32400	0.2491	0.2629
0.3895	4.52	32800	0.2471	0.2611
0.3901	4.57	33200	0.2404	0.2559
0.3818	4.63	33600	0.2378	0.2583
0.3831	4.68	34000	0.2341	0.2499
0.3831	4.74	34400	0.2379	0.2560
0.3808	4.79	34800	0.2418	0.2553
0.4015	4.85	35200	0.2378	0.2565
0.407	4.9	35600	0.2375	0.2535
0.38	4.96	36000	0.2329	0.2451
0.38	5.02	36400	0.2541	0.2737
0.3753	5.07	36800	0.2475	0.2580
0.3701	5.13	37200	0.2356	0.2484
0.3627	5.18	37600	0.2422	0.2552
0.3652	5.24	38000	0.2353	0.2518
0.3652	5.29	38400	0.2328	0.2452
0.3667	5.35	38800	0.2358	0.2478
0.3711	5.4	39200	0.2340	0.2463
0.361	5.46	39600	0.2375	0.2452
0.3655	5.51	40000	0.2292	0.2387
0.3655	5.57	40400	0.2330	0.2432
0.3637	5.62	40800	0.2242	0.2396
0.3516	5.68	41200	0.2284	0.2394
0.3498	5.73	41600	0.2254	0.2343
0.3626	5.79	42000	0.2191	0.2318
0.3626	5.84	42400	0.2261	0.2399
0.3719	5.9	42800	0.2261	0.2411
0.3563	5.95	43200	0.2259	0.2416
0.3574	6.01	43600	0.2148	0.2249
0.3339	6.06	44000	0.2173	0.2237
0.3339	6.12	44400	0.2133	0.2238
0.3303	6.17	44800	0.2193	0.2297
0.331	6.23	45200	0.2122	0.2205
0.3372	6.28	45600	0.2083	0.2215
0.3427	6.34	46000	0.2079	0.2163
0.3427	6.39	46400	0.2072	0.2154
0.3215	6.45	46800	0.2067	0.2170
0.3246	6.5	47200	0.2089	0.2183
0.3217	6.56	47600	0.2030	0.2130
0.3309	6.61	48000	0.2020	0.2123
0.3309	6.67	48400	0.2054	0.2133
0.3343	6.72	48800	0.2013	0.2128
0.3213	6.78	49200	0.1971	0.2064
0.3145	6.83	49600	0.2029	0.2107
0.3274	6.89	50000	0.2038	0.2136
0.3274	6.94	50400	0.1991	0.2064
0.3202	7.0	50800	0.1970	0.2083
0.314	7.05	51200	0.1970	0.2035
0.3031	7.11	51600	0.1943	0.2053
0.3004	7.16	52000	0.1942	0.1985
0.3004	7.22	52400	0.1941	0.2003
0.3029	7.27	52800	0.1936	0.2008
0.2915	7.33	53200	0.1935	0.1995
0.3005	7.38	53600	0.1943	0.2032
0.2984	7.44	54000	0.1913	0.1978
0.2984	7.5	54400	0.1907	0.1965
0.2978	7.55	54800	0.1881	0.1958
0.2944	7.61	55200	0.1887	0.1966
0.3004	7.66	55600	0.1870	0.1930
0.3099	7.72	56000	0.1906	0.1976
0.3099	7.77	56400	0.1856	0.1939
0.2917	7.83	56800	0.1883	0.1961
0.2924	7.88	57200	0.1864	0.1930
0.3061	7.94	57600	0.1831	0.1872
0.2834	7.99	58000	0.1835	0.1896
0.2834	8.05	58400	0.1828	0.1875
0.2807	8.1	58800	0.1820	0.1874
0.2765	8.16	59200	0.1807	0.1869
0.2737	8.21	59600	0.1810	0.1848
0.2722	8.27	60000	0.1795	0.1829
0.2722	8.32	60400	0.1785	0.1826
0.272	8.38	60800	0.1802	0.1836
0.268	8.43	61200	0.1771	0.1813
0.2695	8.49	61600	0.1773	0.1821
0.2686	8.54	62000	0.1756	0.1814
0.2686	8.6	62400	0.1740	0.1770
0.2687	8.65	62800	0.1748	0.1769
0.2686	8.71	63200	0.1734	0.1766
0.2683	8.76	63600	0.1722	0.1759
0.2686	8.82	64000	0.1719	0.1760
0.2686	8.87	64400	0.1720	0.1743
0.2626	8.93	64800	0.1696	0.1742
0.2587	8.98	65200	0.1690	0.1718
0.2554	9.04	65600	0.1704	0.1718

📄 License

This model is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご