The first_model Open-source Speech Recognition Model - Free Deployment for High-precision Speech-to-Text Transcription

First Model

Developed by Vkt

This model is a fine-tuned speech recognition model based on facebook/wav2vec2-xls-r-300m on the common_voice dataset, achieving a low word error rate on the evaluation set.

Speech Recognition

Transformers

Open Source License:Apache-2.0 #Speech Recognition #Low Word Error Rate #Multilingual Support

Downloads 26

Release Time : 3/28/2022

Model Overview

This is a fine-tuned model for speech recognition tasks, based on the wav2vec2-xls-r-300m architecture and trained on the common_voice dataset.

Model Features

Low Word Error Rate

Achieved a word error rate of 0.0141 on the evaluation set, demonstrating excellent performance.

Fine-tuned Based on a Large Model

Fine-tuned on the facebook/wav2vec2-xls-r-300m large model, inheriting its powerful speech feature extraction capabilities.

Efficient Training

Utilized mixed-precision training and gradient accumulation techniques to improve training efficiency.

Model Capabilities

Speech-to-text

Multilingual speech recognition

High-accuracy transcription

Use Cases

Speech Transcription

Automatic Meeting Minutes Transcription

Automatically convert meeting recordings into text transcripts

Highly accurate transcription results

Voice Assistant

Speech recognition module for voice assistant applications

Fast and accurate voice command recognition

Accessibility Technology

Real-time Caption Generation

Provide real-time caption services for the hearing impaired

Low-latency, high-accuracy caption output

🚀 test-model

This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the common_voice dataset. It can achieve low loss and word error rate (Wer) on the evaluation set, which is valuable for speech - related tasks.

📚 Documentation

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

🔧 Technical Details

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 30
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
4.8062	0.29	400	2.0576	1.0
0.9633	0.57	800	0.5862	0.6023
0.6079	0.86	1200	0.4897	0.4824
0.4993	1.14	1600	0.3823	0.3989
0.4269	1.43	2000	0.3749	0.3761
0.4049	1.72	2400	0.3501	0.3536
0.3998	2.0	2800	0.3527	0.3381
0.3172	2.29	3200	0.3188	0.3257
0.3161	2.57	3600	0.3217	0.3185
0.3213	2.86	4000	0.2988	0.3007
0.3035	3.15	4400	0.3036	0.3288
0.261	3.43	4800	0.3095	0.2947
0.2639	3.72	5200	0.2818	0.2767
0.2771	4.0	5600	0.2739	0.2812
0.2343	4.29	6000	0.2820	0.2700
0.2452	4.57	6400	0.2663	0.2697
0.2344	4.86	6800	0.2679	0.2666
0.2215	5.15	7200	0.2687	0.2571
0.2032	5.43	7600	0.2791	0.2624
0.2092	5.72	8000	0.2682	0.2616
0.2122	6.0	8400	0.2770	0.2591
0.1878	6.29	8800	0.2760	0.2584
0.1884	6.58	9200	0.2641	0.2515
0.194	6.86	9600	0.2500	0.2415
0.175	7.15	10000	0.2635	0.2532
0.1658	7.43	10400	0.2588	0.2371
0.177	7.72	10800	0.2813	0.2493
0.1786	8.01	11200	0.2628	0.2437
0.1509	8.29	11600	0.2592	0.2453
0.1597	8.58	12000	0.2737	0.2523
0.1646	8.86	12400	0.2556	0.2436
0.1587	9.15	12800	0.2669	0.2453
0.1489	9.44	13200	0.2596	0.2353
0.1468	9.72	13600	0.2620	0.2419
0.1482	10.01	14000	0.2622	0.2334
0.1285	10.29	14400	0.2531	0.2258
0.1335	10.58	14800	0.2512	0.2273
0.1335	10.86	15200	0.2475	0.2246
0.132	11.15	15600	0.2575	0.2275
0.1249	11.44	16000	0.2503	0.2223
0.1229	11.72	16400	0.2817	0.2297
0.1274	12.01	16800	0.2707	0.2211
0.1115	12.29	17200	0.2647	0.2175
0.117	12.58	17600	0.2501	0.2178
0.1164	12.87	18000	0.2579	0.2216
0.1085	13.15	18400	0.2636	0.2130
0.1033	13.44	18800	0.2643	0.2184
0.1066	13.72	19200	0.2519	0.2158
0.1032	14.01	19600	0.2322	0.2082
0.0981	14.3	20000	0.2613	0.2125
0.1009	14.58	20400	0.2479	0.2076
0.1	14.87	20800	0.2464	0.2058
0.0886	15.15	21200	0.2595	0.2014
0.0888	15.44	21600	0.2565	0.2048
0.0916	15.73	22000	0.2470	0.2000
0.095	16.01	22400	0.2539	0.1997
0.0875	16.3	22800	0.2576	0.1995
0.0833	16.58	23200	0.2514	0.1990
0.0813	16.87	23600	0.2522	0.2020
0.0845	17.16	24000	0.2522	0.2045
0.0879	17.44	24400	0.2629	0.2183
0.0854	17.73	24800	0.2464	0.2000
0.0795	18.01	25200	0.2526	0.2078
0.075	18.3	25600	0.2519	0.1971
0.0724	18.58	26000	0.2551	0.1965
0.0735	18.87	26400	0.2536	0.1934
0.0735	19.16	26800	0.2504	0.1916
0.0676	19.44	27200	0.2532	0.1884
0.0687	19.73	27600	0.2498	0.1849
0.0652	20.01	28000	0.2490	0.1847
0.0617	20.3	28400	0.2547	0.1899
0.0627	20.59	28800	0.2509	0.1834
0.0639	20.87	29200	0.2472	0.1812
0.0611	21.16	29600	0.2486	0.1827
0.0559	21.44	30000	0.2530	0.1825
0.0564	21.73	30400	0.2484	0.1785
0.0593	22.02	30800	0.2425	0.1781
0.0517	22.3	31200	0.2613	0.1775
0.0528	22.59	31600	0.2517	0.1759
0.0556	22.87	32000	0.2494	0.1811
0.0507	23.16	32400	0.2522	0.1761
0.0485	23.45	32800	0.2344	0.1717
0.0504	23.73	33200	0.2458	0.1772
0.0485	24.02	33600	0.2497	0.1748
0.0436	24.3	34000	0.2405	0.1738
0.0468	24.59	34400	0.2446	0.1735
0.0443	24.87	34800	0.2514	0.1709
0.0417	25.16	35200	0.2515	0.1711
0.0399	25.45	35600	0.2452	0.1664
0.0416	25.73	36000	0.2438	0.1664
0.0412	26.02	36400	0.2457	0.1662
0.0406	26.3	36800	0.2475	0.1659
0.0376	26.59	37200	0.2454	0.1682
0.0365	26.88	37600	0.2511	0.1650
0.0355	27.16	38000	0.2518	0.1633
0.032	27.45	38400	0.2479	0.1604
0.0348	27.73	38800	0.2391	0.1599
0.0331	28.02	39200	0.2417	0.1617
0.0349	28.31	39600	0.2358	0.1590
0.0347	28.59	40000	0.2388	0.1582
0.0325	28.88	40400	0.2412	0.1564
0.0332	29.16	40800	0.2390	0.1545
0.0613	29.45	41200	0.0167	0.0141
0.0563	29.74	41600	0.0161	0.0141

Framework versions

Transformers 4.17.0
Pytorch 1.8.1+cu111
Datasets 2.2.1
Tokenizers 0.12.1

📄 License

This model is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご