Model Selection

wav2vec2 fine-tuning

# wav2vec2 fine-tuning

Wav2vec2 Base 100k Gtzan Music Genres Finetuned Wav2vec2 Ivan

A music genre classification model based on the wav2vec2 architecture, fine-tuned on the GTZAN dataset with 98% accuracy

Audio Classification

Wav2vec2 Large Xlsr 53 Serbian Smart Home Commands

A wav2vec2-based Serbian smart home voice command recognition model capable of identifying 7 control commands

Audio Classification Other

Wav2vec2 ASV Deepfake Audio Detection

A deepfake audio detection model fine-tuned based on facebook/wav2vec2-base, used to identify synthetic or tampered speech content

Speaker Analysis

Japanese Wav2vec2 Base Rs35kh

A wav2vec 2.0 Base model fine-tuned on the large-scale Japanese automatic speech recognition corpus ReazonSpeech v2.0, suitable for Japanese automatic speech recognition tasks.

Speech Recognition

Transformers Japanese

reazon-research

Audio Emotion Detection

This model is fine-tuned from facebook/wav2vec2-large-xlsr-53 for audio emotion detection, capable of recognizing 7 emotional states

Audio Classification

Wav2vec2 Large Xls R 300m Amharic Demo Colab

Amharic speech recognition model fine-tuned on the common_voice_16_1 dataset based on facebook/wav2vec2-xls-r-300m

Speech Recognition

Wav2vec2 Large Lv60 Phoneme Timit English Timit 4k

English phoneme recognition model fine-tuned from facebook/wav2vec2-large-lv60, achieving a phoneme error rate of 10.53% on the TIMIT dataset

Speech Recognition

Transformers English

Wav2vec2 Large Xlrs Korean V5

This model is a Korean automatic speech recognition model fine-tuned on the zeroth_korean dataset based on facebook/wav2vec2-xls-r-300m, with a word error rate of 0.2433.

Speech Recognition

Deepfake Audio Detection

A deepfake audio detection model fine-tuned based on facebook/wav2vec2-base, achieving 95.45% accuracy on the evaluation set

Audio Classification

Deeepfake Audio Recognition Ttoo

A deepfake audio detection model fine-tuned based on facebook/wav2vec2-base, achieving 95.45% accuracy on the evaluation set

Audio Classification

Wav2vec Fine Tuned Speech Command2

A speech recognition model fine-tuned on the speech_commands dataset based on facebook/wav2vec2-base, achieving 97.35% accuracy

Audio Classification

Viet Tones Model

Vietnamese tone recognition model fine-tuned on wav2vec2-base-vietnamese-250h, accuracy 59.72%

Speech Recognition

Asr Wav2vec2 Commonvoice 14 Zh CN

This is an end-to-end automatic speech recognition system trained on the CommonVoice Chinese dataset, using wav2vec2.0 and CTC architecture, supporting Chinese speech recognition.

Speech Recognition Chinese

Asr Wav2vec2 Commonvoice 14 Es

This is an end-to-end automatic speech recognition system trained on the CommonVoice Spanish dataset, using the wav2vec 2.0 pre-trained model combined with a CTC decoder.

Speech Recognition Spanish

Wav2vec2 Nepali

Nepali speech recognition model fine-tuned based on Facebook's wav2vec2 model

Speech Recognition

Transformers Other

anish-shilpakar

SER Wav2vec2 Large Xlsr 53 Eng Zho Adults

A cross-language and cross-age group speech emotion recognition model fine-tuned on wav2vec2-large-xlsr-53, supporting English and Chinese

Audio Classification

Transformers Supports Multiple Languages

Speechcommand Demo

A fine-tuned voice command classification model based on facebook/wav2vec2-base, trained on the superb dataset with an accuracy of 98.09%

Audio Classification

Wav2vec2 Base Finetuned Speech Commands V0.02

This model is a voice command recognition model fine-tuned on the speech_commands dataset based on facebook/wav2vec2-base, achieving an accuracy of 97.59%.

Audio Classification

Ser Model Adjusted 2023 03 03

A speech emotion recognition model fine-tuned based on facebook/wav2vec2-base, achieving an accuracy of 75.73% on the evaluation set

Audio Classification

Wav2vec2 Base Drum Kit Sounds

A multi-class audio classification model fine-tuned based on facebook/wav2vec2-base for recognizing drum instrument sounds

Audio Classification

Transformers English

Wav2vec2 Base Timit Demo Google Colab

This model is a speech recognition model fine-tuned on the TIMIT dataset based on facebook/wav2vec2-base and trained in the Google Colab environment.

Speech Recognition

Ai Light Dance Singing Ft Pretrain Wav2vec2 Large Lv60

This model is an automatic speech recognition (ASR) model based on the wav2vec2-large-lv60 architecture, fine-tuned on the GARY109/AI_LIGHT_DANCE - ONSET-SINGING dataset, primarily used for singing voice recognition tasks.

Speech Recognition

Asr Wav2vec2 Dvoice Amharic

This is an automatic speech recognition model for Amharic, trained using wav2vec 2.0 architecture with CTC/Attention mechanism

Speech Recognition Other

Asr Wav2vec2 Dvoice Darija

This is an automatic speech recognition model for the Moroccan Arabic dialect (Darija), fine-tuned on the DVoice dataset based on the wav2vec 2.0 architecture.

Speech Recognition Other

Asr Wav2vec2 Librispeech

This is an end-to-end automatic speech recognition system trained on the LibriSpeech dataset, combining the wav2vec 2.0 pre-trained model and CTC technology, excelling in English speech recognition tasks.

Speech Recognition English

Wav2vec2 Base Common Voice Persian Colab

This model is a fine-tuned speech recognition model based on facebook/wav2vec2-base for Persian language datasets, primarily used for Persian speech-to-text tasks.

Speech Recognition

English Filipino Wav2vec2 L Xls R Test 05

This is a speech recognition model fine-tuned on Filipino speech datasets based on the wav2vec2-large-xlsr-53-english model, supporting English and Filipino speech-to-text tasks.

Speech Recognition

Speech Processing Project Wav2vec2

This model is a fine-tuned speech processing model based on kingabzpro/wav2vec2-urdu, suitable for specific speech recognition tasks.

Speech Recognition

Filipino Wav2vec2 L Xls R 300m Test

This model is a speech recognition model fine-tuned on the filipino_voice dataset based on facebook/wav2vec2-xls-r-300m, supporting Filipino language.

Speech Recognition

Wav2vec2 Large 960h Lv60 Self MIDIARIES 72H FT

A speech recognition model fine-tuned using 72 hours of MI diary data, based on Facebook's pre-trained wav2vec2 large 960H lv60 self-supervised model

Speech Recognition

Wav2vec2 Base Common Voice Fa Demo Colab

This model is a Persian speech recognition model fine-tuned based on facebook/wav2vec2-base, suitable for Persian speech-to-text tasks.

Speech Recognition

An automatic speech recognition system fine-tuned on the Common Voice 8 Belarusian dataset based on facebook/wav2vec2-base model

Speech Recognition

Transformers Other

Wav2vec2 Xls R 300m Es

This model is a fine-tuned Spanish automatic speech recognition model based on facebook/wav2vec2-xls-r-300m on the COMMON_VOICE - ES dataset.

Speech Recognition

Transformers Spanish

Asr Wav2vec2 Commonvoice Fr

wav2vec 2.0 speech recognition model trained on the CommonVoice French dataset, using CTC/Attention architecture without requiring a language model

Speech Recognition French

Wav2vec2 Latino40

A speech recognition model fine-tuned based on facebook/wav2vec2-base, supporting Latin language speech processing

Speech Recognition

A fine-tuned Bengali automatic speech recognition (ASR) model based on Harveenchadha/vakyansh-wav2vec2-bengali-bnm-200

Speech Recognition

Wav2vec2 Xls R Parlaspeech Hr

A Croatian automatic speech recognition model fine-tuned from facebook/wav2vec2-xls-r-300m, trained on 300 hours of parliamentary speech data

Speech Recognition

Transformers Other

Asr Wav2vec2 Commonvoice Rw

This is an end-to-end model for automatic speech recognition in Rwandan, based on the wav2vec 2.0 pre-trained model combined with CTC and attention mechanisms, fine-tuned on the CommonVoice dataset.

Speech Recognition Other

Wav2vec2 Large Xls R 300m Spanish Small

This is a Spanish speech recognition model based on the wav2vec2 architecture, fine-tuned on the Common Voice dataset with a word error rate (WER) of 0.2105.

Speech Recognition

Wav2vec2 Large Xls R 300m Basque

An automatic speech recognition model fine-tuned on the Basque Common Voice dataset based on facebook/wav2vec2-xls-r-300m

Speech Recognition

Transformers Other

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase