# 16kHz Audio Processing
Mms Lid 126
A language identification model fine-tuned from Facebook's Massively Multilingual Speech project, supporting audio classification for 126 languages
Audio Classification
Transformers Supports Multiple Languages

M
facebook
2.1M
26
Assignment1 Omar
Apache-2.0
Wav2Vec2 is a self-supervised learning-based speech recognition model, pre-trained and fine-tuned on 960 hours of LibriSpeech audio data, supporting English speech transcription.
Speech Recognition
Transformers English

A
Classroom-workshop
28
0
Wav2vec2 Conformer Rel Pos Large 100h Ft
Apache-2.0
A large-scale Wav2Vec2-Conformer speech recognition model using relative position embedding, fine-tuned on 100 hours of Librispeech data
Speech Recognition
Transformers English

W
facebook
99
0
Wav2vec2 Base Superb Sv
Apache-2.0
This is a speaker verification model based on the Wav2Vec2 architecture, specifically designed for the speaker verification task in the SUPERB benchmark.
Speaker Analysis
Transformers English

W
anton-l
901
3
Hubert Base Superb Ic
Apache-2.0
A speech intent classification model fine-tuned on the SUPERB intent classification task, based on the Hubert-Base-LS960 pre-trained model
Audio Classification
Transformers English

H
superb
578
0
Wav2vec2 Base Superb Sid
Apache-2.0
A speaker identification model fine-tuned on the VoxCeleb1 dataset based on the Wav2Vec2-base pre-trained model, designed for voice classification tasks
Speaker Analysis
Transformers English

W
superb
1,489
20
Wav2vec2 Base Superb Er
Apache-2.0
This is a speech emotion recognition model based on the Wav2Vec2 architecture, adapted from the S3PRL project, designed to identify emotional categories in speech.
Audio Classification
Transformers English

W
superb
28.14k
11
Hubert Base Superb Ks
Apache-2.0
This model is a keyword spotting model based on the Hubert architecture, designed to classify speech segments into predefined keyword sets.
Audio Classification
Transformers English

H
superb
11.29k
8
Wav2vec2 Large Xlsr Turkish Artificial Cv
Apache-2.0
This is a Turkish automatic speech recognition model based on the XLSR Wav2Vec2 architecture, fine-tuned on the Common Voice Turkish dataset.
Speech Recognition Other
W
cahya
26
0
Wav2vec2 Large Superb Er
Apache-2.0
This is an emotion recognition model based on the Wav2Vec2-Large model, specifically designed to identify emotion categories from speech.
Audio Classification
Transformers English

W
superb
1,442
1
Sew D Mid 400k Ft Ls100h
Apache-2.0
SEW-D-mid is a speech pre-training model developed by ASAPP Research, focusing on automatic speech recognition tasks, achieving a good balance between performance and efficiency.
Speech Recognition
Transformers English

S
asapp
20
1
Wav2vec2 Large Robust Ft Swbd 300h
Apache-2.0
This model is a fine-tuned version of Facebook's Wav2Vec2-Large-Robust, specifically optimized for telephone speech recognition tasks, using 300 hours of Switchboard telephone speech corpus for fine-tuning.
Speech Recognition
Transformers English

W
facebook
2,543
20
Hubert Base Superb Sid
Apache-2.0
Hubert-based speaker recognition model optimized for the SUPERB benchmark tasks
Speaker Analysis
Transformers English

H
superb
673
1
Wave2vec2 Large Xlsr Hindi
Apache-2.0
A Hindi speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, trained using OpenSLR and Common Voice Hindi datasets, supporting 16kHz sampling rate audio input.
Speech Recognition
Transformers Other

W
shiwangi27
63
1
Wav2vec2 Large Superb Ic
Apache-2.0
Intent classification model based on Wav2Vec2-Large-LV60, fine-tuned on the SUPERB intent classification task for speech command intent recognition
Audio Classification
Transformers English

W
superb
110
1
Sew D Tiny 100k Ft Ls100h
Apache-2.0
SEW-D-tiny is an efficient speech recognition pre-trained model developed by ASAPP Research, focusing on the balance between performance and efficiency.
Speech Recognition
Transformers English

S
asapp
24.55k
2
Hubert Base Superb Er
Apache-2.0
This model is an emotion recognition model based on the Hubert-Base architecture, trained on the SUPERB emotion recognition task for speech emotion classification
Audio Classification
Transformers English

H
superb
7,887
20
Wav2vec2 Base Superb Ic
Apache-2.0
This model is an intent classification model based on Wav2Vec2-base, specifically designed for recognizing intents in voice commands, supporting the classification of speech segments into predefined intent categories.
Audio Classification
Transformers English

W
superb
779
0
Sew Tiny 100k Ft Ls100h
Apache-2.0
SEW (Squeezed and Efficient Wav2vec) is a speech recognition pre-trained model developed by ASAPP Research, outperforming wav2vec 2.0 in both performance and efficiency.
Speech Recognition
Transformers Supports Multiple Languages

S
asapp
736
1
Sew D Mid K127 400k Ft Ls100h
Apache-2.0
SEW-D-mid-k127 is an efficient speech recognition pre-trained model developed by ASAPP Research, demonstrating significant improvements in performance and efficiency compared to wav2vec 2.0.
Speech Recognition
Transformers English

S
asapp
16
0
Wav2vec2 Large Xlsr 53 German
Apache-2.0
Large-scale German automatic speech recognition (ASR) model based on Facebook's Wav2Vec2 architecture, fine-tuned on the Common Voice German dataset
Speech Recognition German
W
facebook
1,767
3
Wav2vec2 Large Superb Sid
Apache-2.0
Speaker identification model based on the Wav2Vec2-Large architecture, trained on the VoxCeleb1 dataset for classifying speech by speaker identity
Speaker Analysis
Transformers English

W
superb
27
1
Hubert Large Superb Er
Apache-2.0
An emotion recognition model based on Hubert-Large pre-trained model for predicting emotion categories in speech
Audio Classification
Transformers English

H
superb
10.24k
21
Wav2vec2 Large Xlsr Bengali
A Bengali automatic speech recognition model fine-tuned based on facebook/wav2vec2-large-xlsr-53, trained with 40,000 speech samples from the OpenSLR dataset
Speech Recognition Other
W
arijitx
758
6
Unispeech Sat Base 100h Libri Ft
Apache-2.0
An automatic speech recognition model based on the UniSpeech-SAT base model, fine-tuned on 100 hours of LibriSpeech data
Speech Recognition
Transformers English

U
microsoft
643
4
Featured Recommended AI Models