# wav2vec2 fine-tuning
Wav2vec2 Base 100k Gtzan Music Genres Finetuned Wav2vec2 Ivan
A music genre classification model based on the wav2vec2 architecture, fine-tuned on the GTZAN dataset with 98% accuracy
Audio Classification
Transformers

W
itmanov
32
1
Wav2vec2 Large Xlsr 53 Serbian Smart Home Commands
MIT
A wav2vec2-based Serbian smart home voice command recognition model capable of identifying 7 control commands
Audio Classification Other
W
mradovic38
320
0
Wav2vec2 ASV Deepfake Audio Detection
Apache-2.0
A deepfake audio detection model fine-tuned based on facebook/wav2vec2-base, used to identify synthetic or tampered speech content
Speaker Analysis
Transformers

W
Bisher
106
1
Japanese Wav2vec2 Base Rs35kh
Apache-2.0
A wav2vec 2.0 Base model fine-tuned on the large-scale Japanese automatic speech recognition corpus ReazonSpeech v2.0, suitable for Japanese automatic speech recognition tasks.
Speech Recognition
Transformers Japanese

J
reazon-research
3,968
1
Audio Emotion Detection
Apache-2.0
This model is fine-tuned from facebook/wav2vec2-large-xlsr-53 for audio emotion detection, capable of recognizing 7 emotional states
Audio Classification
Transformers

A
Hatman
630
8
Wav2vec2 Large Xls R 300m Amharic Demo Colab
Apache-2.0
Amharic speech recognition model fine-tuned on the common_voice_16_1 dataset based on facebook/wav2vec2-xls-r-300m
Speech Recognition
Transformers

W
DipsankarSinha
18
2
Wav2vec2 Large Lv60 Phoneme Timit English Timit 4k
Apache-2.0
English phoneme recognition model fine-tuned from facebook/wav2vec2-large-lv60, achieving a phoneme error rate of 10.53% on the TIMIT dataset
Speech Recognition
Transformers English

W
excalibur12
306
3
Wav2vec2 Large Xlrs Korean V5
Apache-2.0
This model is a Korean automatic speech recognition model fine-tuned on the zeroth_korean dataset based on facebook/wav2vec2-xls-r-300m, with a word error rate of 0.2433.
Speech Recognition
Transformers

W
student-47
285
1
Deepfake Audio Detection
Apache-2.0
A deepfake audio detection model fine-tuned based on facebook/wav2vec2-base, achieving 95.45% accuracy on the evaluation set
Audio Classification
Transformers

D
Heem2
246
0
Deeepfake Audio Recognition Ttoo
Apache-2.0
A deepfake audio detection model fine-tuned based on facebook/wav2vec2-base, achieving 95.45% accuracy on the evaluation set
Audio Classification
Transformers

D
Hemg
20
0
Wav2vec Fine Tuned Speech Command2
Apache-2.0
A speech recognition model fine-tuned on the speech_commands dataset based on facebook/wav2vec2-base, achieving 97.35% accuracy
Audio Classification
Transformers

W
Thamer
16
0
Viet Tones Model
Vietnamese tone recognition model fine-tuned on wav2vec2-base-vietnamese-250h, accuracy 59.72%
Speech Recognition
Transformers

V
StevenLe456
22
0
Asr Wav2vec2 Commonvoice 14 Zh CN
Apache-2.0
This is an end-to-end automatic speech recognition system trained on the CommonVoice Chinese dataset, using wav2vec2.0 and CTC architecture, supporting Chinese speech recognition.
Speech Recognition Chinese
A
speechbrain
36
9
Asr Wav2vec2 Commonvoice 14 Es
Apache-2.0
This is an end-to-end automatic speech recognition system trained on the CommonVoice Spanish dataset, using the wav2vec 2.0 pre-trained model combined with a CTC decoder.
Speech Recognition Spanish
A
speechbrain
22
3
Wav2vec2 Nepali
Nepali speech recognition model fine-tuned based on Facebook's wav2vec2 model
Speech Recognition
Transformers Other

W
anish-shilpakar
312
1
SER Wav2vec2 Large Xlsr 53 Eng Zho Adults
A cross-language and cross-age group speech emotion recognition model fine-tuned on wav2vec2-large-xlsr-53, supporting English and Chinese
Audio Classification
Transformers Supports Multiple Languages

S
CAiRE
32
0
Speechcommand Demo
Apache-2.0
A fine-tuned voice command classification model based on facebook/wav2vec2-base, trained on the superb dataset with an accuracy of 98.09%
Audio Classification
Transformers

S
SHENMU007
18
0
Wav2vec2 Base Finetuned Speech Commands V0.02
Apache-2.0
This model is a voice command recognition model fine-tuned on the speech_commands dataset based on facebook/wav2vec2-base, achieving an accuracy of 97.59%.
Audio Classification
Transformers

W
0xb1
1.2M
0
Ser Model Adjusted 2023 03 03
Apache-2.0
A speech emotion recognition model fine-tuned based on facebook/wav2vec2-base, achieving an accuracy of 75.73% on the evaluation set
Audio Classification
Transformers

S
aherzberg
18
0
Wav2vec2 Base Drum Kit Sounds
Apache-2.0
A multi-class audio classification model fine-tuned based on facebook/wav2vec2-base for recognizing drum instrument sounds
Audio Classification
Transformers English

W
DunnBC22
15
4
Wav2vec2 Base Timit Demo Google Colab
Apache-2.0
This model is a speech recognition model fine-tuned on the TIMIT dataset based on facebook/wav2vec2-base and trained in the Google Colab environment.
Speech Recognition
Transformers

W
pannaga
16
0
Ai Light Dance Singing Ft Pretrain Wav2vec2 Large Lv60
This model is an automatic speech recognition (ASR) model based on the wav2vec2-large-lv60 architecture, fine-tuned on the GARY109/AI_LIGHT_DANCE - ONSET-SINGING dataset, primarily used for singing voice recognition tasks.
Speech Recognition
Transformers

A
gary109
22
0
Asr Wav2vec2 Dvoice Amharic
Apache-2.0
This is an automatic speech recognition model for Amharic, trained using wav2vec 2.0 architecture with CTC/Attention mechanism
Speech Recognition Other
A
speechbrain
96
9
Asr Wav2vec2 Dvoice Darija
Apache-2.0
This is an automatic speech recognition model for the Moroccan Arabic dialect (Darija), fine-tuned on the DVoice dataset based on the wav2vec 2.0 architecture.
Speech Recognition Other
A
speechbrain
120
11
Asr Wav2vec2 Librispeech
Apache-2.0
This is an end-to-end automatic speech recognition system trained on the LibriSpeech dataset, combining the wav2vec 2.0 pre-trained model and CTC technology, excelling in English speech recognition tasks.
Speech Recognition English
A
speechbrain
1,667
9
Wav2vec2 Base Common Voice Persian Colab
Apache-2.0
This model is a fine-tuned speech recognition model based on facebook/wav2vec2-base for Persian language datasets, primarily used for Persian speech-to-text tasks.
Speech Recognition
Transformers

W
zoha
21
0
English Filipino Wav2vec2 L Xls R Test 05
Apache-2.0
This is a speech recognition model fine-tuned on Filipino speech datasets based on the wav2vec2-large-xlsr-53-english model, supporting English and Filipino speech-to-text tasks.
Speech Recognition
Transformers

E
Khalsuu
67
1
Speech Processing Project Wav2vec2
Apache-2.0
This model is a fine-tuned speech processing model based on kingabzpro/wav2vec2-urdu, suitable for specific speech recognition tasks.
Speech Recognition
Transformers

S
Raffay
21
0
Filipino Wav2vec2 L Xls R 300m Test
Apache-2.0
This model is a speech recognition model fine-tuned on the filipino_voice dataset based on facebook/wav2vec2-xls-r-300m, supporting Filipino language.
Speech Recognition
Transformers

F
Khalsuu
5,738
0
Wav2vec2 Large 960h Lv60 Self MIDIARIES 72H FT
A speech recognition model fine-tuned using 72 hours of MI diary data, based on Facebook's pre-trained wav2vec2 large 960H lv60 self-supervised model
Speech Recognition
Transformers

W
caurdy
20
0
Wav2vec2 Base Common Voice Fa Demo Colab
Apache-2.0
This model is a Persian speech recognition model fine-tuned based on facebook/wav2vec2-base, suitable for Persian speech-to-text tasks.
Speech Recognition
Transformers

W
zoha
15
0
Wav2vec2 Cv Be
Gpl-3.0
An automatic speech recognition system fine-tuned on the Common Voice 8 Belarusian dataset based on facebook/wav2vec2-base model
Speech Recognition
Transformers Other

W
ales
278
1
Wav2vec2 Xls R 300m Es
Apache-2.0
This model is a fine-tuned Spanish automatic speech recognition model based on facebook/wav2vec2-xls-r-300m on the COMMON_VOICE - ES dataset.
Speech Recognition
Transformers Spanish

W
samitizerxu
23
0
Asr Wav2vec2 Commonvoice Fr
Apache-2.0
wav2vec 2.0 speech recognition model trained on the CommonVoice French dataset, using CTC/Attention architecture without requiring a language model
Speech Recognition French
A
speechbrain
250
10
Wav2vec2 Latino40
Apache-2.0
A speech recognition model fine-tuned based on facebook/wav2vec2-base, supporting Latin language speech processing
Speech Recognition
Transformers

W
cristinakuo
17
0
Bangla Asr
A fine-tuned Bengali automatic speech recognition (ASR) model based on Harveenchadha/vakyansh-wav2vec2-bengali-bnm-200
Speech Recognition
Transformers

B
danielbubiola
17
0
Wav2vec2 Xls R Parlaspeech Hr
A Croatian automatic speech recognition model fine-tuned from facebook/wav2vec2-xls-r-300m, trained on 300 hours of parliamentary speech data
Speech Recognition
Transformers Other

W
classla
46.84k
3
Asr Wav2vec2 Commonvoice Rw
Apache-2.0
This is an end-to-end model for automatic speech recognition in Rwandan, based on the wav2vec 2.0 pre-trained model combined with CTC and attention mechanisms, fine-tuned on the CommonVoice dataset.
Speech Recognition Other
A
speechbrain
28
1
Wav2vec2 Large Xls R 300m Spanish Small
This is a Spanish speech recognition model based on the wav2vec2 architecture, fine-tuned on the Common Voice dataset with a word error rate (WER) of 0.2105.
Speech Recognition
Transformers

W
glob-asr
58
0
Wav2vec2 Large Xls R 300m Basque
Apache-2.0
An automatic speech recognition model fine-tuned on the Basque Common Voice dataset based on facebook/wav2vec2-xls-r-300m
Speech Recognition
Transformers Other

W
deepdml
31
0
- 1
- 2
Featured Recommended AI Models