Model Selection

16kHz Audio Processing

# 16kHz Audio Processing

A language identification model fine-tuned from Facebook's Massively Multilingual Speech project, supporting audio classification for 126 languages

Audio Classification

Transformers Supports Multiple Languages

Assignment1 Omar

Wav2Vec2 is a self-supervised learning-based speech recognition model, pre-trained and fine-tuned on 960 hours of LibriSpeech audio data, supporting English speech transcription.

Speech Recognition

Transformers English

Classroom-workshop

Wav2vec2 Conformer Rel Pos Large 100h Ft

A large-scale Wav2Vec2-Conformer speech recognition model using relative position embedding, fine-tuned on 100 hours of Librispeech data

Speech Recognition

Transformers English

Wav2vec2 Base Superb Sv

This is a speaker verification model based on the Wav2Vec2 architecture, specifically designed for the speaker verification task in the SUPERB benchmark.

Speaker Analysis

Transformers English

Hubert Base Superb Ic

A speech intent classification model fine-tuned on the SUPERB intent classification task, based on the Hubert-Base-LS960 pre-trained model

Audio Classification

Transformers English

Wav2vec2 Base Superb Sid

A speaker identification model fine-tuned on the VoxCeleb1 dataset based on the Wav2Vec2-base pre-trained model, designed for voice classification tasks

Speaker Analysis

Transformers English

Wav2vec2 Base Superb Er

This is a speech emotion recognition model based on the Wav2Vec2 architecture, adapted from the S3PRL project, designed to identify emotional categories in speech.

Audio Classification

Transformers English

Hubert Base Superb Ks

This model is a keyword spotting model based on the Hubert architecture, designed to classify speech segments into predefined keyword sets.

Audio Classification

Transformers English

Wav2vec2 Large Xlsr Turkish Artificial Cv

This is a Turkish automatic speech recognition model based on the XLSR Wav2Vec2 architecture, fine-tuned on the Common Voice Turkish dataset.

Speech Recognition Other

Wav2vec2 Large Superb Er

This is an emotion recognition model based on the Wav2Vec2-Large model, specifically designed to identify emotion categories from speech.

Audio Classification

Transformers English

Sew D Mid 400k Ft Ls100h

SEW-D-mid is a speech pre-training model developed by ASAPP Research, focusing on automatic speech recognition tasks, achieving a good balance between performance and efficiency.

Speech Recognition

Transformers English

Wav2vec2 Large Robust Ft Swbd 300h

This model is a fine-tuned version of Facebook's Wav2Vec2-Large-Robust, specifically optimized for telephone speech recognition tasks, using 300 hours of Switchboard telephone speech corpus for fine-tuning.

Speech Recognition

Transformers English

Hubert Base Superb Sid

Hubert-based speaker recognition model optimized for the SUPERB benchmark tasks

Speaker Analysis

Transformers English

Wave2vec2 Large Xlsr Hindi

A Hindi speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, trained using OpenSLR and Common Voice Hindi datasets, supporting 16kHz sampling rate audio input.

Speech Recognition

Transformers Other

Wav2vec2 Large Superb Ic

Intent classification model based on Wav2Vec2-Large-LV60, fine-tuned on the SUPERB intent classification task for speech command intent recognition

Audio Classification

Transformers English

Sew D Tiny 100k Ft Ls100h

SEW-D-tiny is an efficient speech recognition pre-trained model developed by ASAPP Research, focusing on the balance between performance and efficiency.

Speech Recognition

Transformers English

Hubert Base Superb Er

This model is an emotion recognition model based on the Hubert-Base architecture, trained on the SUPERB emotion recognition task for speech emotion classification

Audio Classification

Transformers English

Wav2vec2 Base Superb Ic

This model is an intent classification model based on Wav2Vec2-base, specifically designed for recognizing intents in voice commands, supporting the classification of speech segments into predefined intent categories.

Audio Classification

Transformers English

Sew Tiny 100k Ft Ls100h

SEW (Squeezed and Efficient Wav2vec) is a speech recognition pre-trained model developed by ASAPP Research, outperforming wav2vec 2.0 in both performance and efficiency.

Speech Recognition

Transformers Supports Multiple Languages

Sew D Mid K127 400k Ft Ls100h

SEW-D-mid-k127 is an efficient speech recognition pre-trained model developed by ASAPP Research, demonstrating significant improvements in performance and efficiency compared to wav2vec 2.0.

Speech Recognition

Transformers English

Wav2vec2 Large Xlsr 53 German

Large-scale German automatic speech recognition (ASR) model based on Facebook's Wav2Vec2 architecture, fine-tuned on the Common Voice German dataset

Speech Recognition German

Wav2vec2 Large Superb Sid

Speaker identification model based on the Wav2Vec2-Large architecture, trained on the VoxCeleb1 dataset for classifying speech by speaker identity

Speaker Analysis

Transformers English

Hubert Large Superb Er

An emotion recognition model based on Hubert-Large pre-trained model for predicting emotion categories in speech

Audio Classification

Transformers English

Wav2vec2 Large Xlsr Bengali

A Bengali automatic speech recognition model fine-tuned based on facebook/wav2vec2-large-xlsr-53, trained with 40,000 speech samples from the OpenSLR dataset

Speech Recognition Other

Unispeech Sat Base 100h Libri Ft

An automatic speech recognition model based on the UniSpeech-SAT base model, fine-tuned on 100 hours of LibriSpeech data

Speech Recognition

Transformers English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase