Model Selection

English Speech Recognition

# English Speech Recognition

Parakeet Tdt 0.6b V2 Onnx

NVIDIA Parakeet TDT 0.6B V2 is a model based on automatic speech recognition (ASR) tasks, suitable for English speech-to-text tasks.

Speech Recognition English

Whisper Custom Small

A small speech recognition model based on the OpenAI Whisper architecture, focused on English speech-to-text tasks.

Speech Recognition English

Wav2vec2 Tellmate

A speech recognition model optimized for chess coordinate recognition, fine-tuned using nearly 2,500 English chess coordinate audio files

Speech Recognition

Transformers Supports Multiple Languages

Moonshine is a series of automatic speech recognition (ASR) models developed by Useful Sensors, specifically designed for English speech transcription, excelling on resource-constrained platforms.

Speech Recognition

Transformers English

Whisper Base.en

Whisper is a general-purpose speech recognition model trained by OpenAI. This model is based on large-scale weakly supervised training and supports speech transcription in multiple languages.

Speech Recognition

Whisper Medicalv1

Distil-Whisper is a knowledge-distilled version of Whisper large-v3, focusing on English speech recognition, offering faster inference speeds while maintaining accuracy close to the original model.

Speech Recognition English

Wav2vec2 Bert CV16 En

An automatic speech recognition (ASR) model fine-tuned on the Common Voice 16.0 English dataset based on w2v-bert-2.0

Speech Recognition

Transformers English

Distil Small.en

Distil-Whisper is a distilled version of the Whisper model, 6x faster with 49% smaller size, achieving near 1% WER on out-of-distribution evaluation sets.

Speech Recognition

Transformers English

Faster Whisper Small.en

CTranslate2 converted version of OpenAI Whisper small.en model for efficient speech recognition

Speech Recognition English

Distil Medium.en

Distil-Whisper is a distilled version of the Whisper model, 6 times faster than the original, with a 49% reduction in size, while maintaining performance close to the original in English speech recognition tasks.

Speech Recognition English

Distil Large V2

Distil-Whisper is a distilled version of the Whisper model, achieving 6x speedup and 49% size reduction with only a 1% WER difference on out-of-distribution evaluation sets.

Speech Recognition English

Wav2vec2 Base 960h

ONNX format conversion of Facebook's wav2vec2-base-960h model, designed for Transformers.js, supporting browser-side speech recognition

Speech Recognition

Wav2vec2 Large Xlsr 53 English

Large-scale speech recognition model based on the wav2vec 2.0 architecture, supporting English speech-to-text conversion

Speech Recognition

Exp W2v2t En Unispeech Sat S459

An English speech recognition model fine-tuned based on Microsoft's UniSpeech-SAT-Large model, supporting 16kHz sampled audio input.

Speech Recognition

Transformers English

Wav2vec2 Large Xlsr 53 Enlgish FT ASCEND Colab

This model is a fine-tuned speech recognition model based on jonatasgrosman/wav2vec2-large-xlsr-53-english on the ascend dataset.

Speech Recognition

Assignment1 Omar

Wav2Vec2 is a self-supervised learning-based speech recognition model, pre-trained and fine-tuned on 960 hours of LibriSpeech audio data, supporting English speech transcription.

Speech Recognition

Transformers English

Classroom-workshop

Xtreme S Xlsr 300m Voxpopuli En

This model is a fine-tuned speech recognition model based on facebook/wav2vec2-xls-r-300m on the GOOGLE/XTREME_S - VOXPOPULI.EN dataset, supporting English speech-to-text tasks.

Speech Recognition

Transformers English

Ascend With English

English speech recognition model fine-tuned on the timit_asr dataset for the ascend model

Speech Recognition

Wav2vec2 2 Gpt2 Regularisation

This is an automatic speech recognition (ASR) model trained on the LibriSpeech dataset, capable of converting English speech into text.

Speech Recognition

An automatic speech recognition model fine-tuned on the English Common Voice dataset based on facebook/wav2vec2-large-xlsr-53, supporting English speech input at 16kHz sampling rate.

Speech Recognition English

Wav2vec2 Base Superb Sv

This is a speaker verification model based on the Wav2Vec2 architecture, specifically designed for the speaker verification task in the SUPERB benchmark.

Speaker Analysis

Transformers English

This model is a fine-tuned English Automatic Speech Recognition (ASR) model based on facebook/wav2vec2-base, achieving a word error rate of 0.3397 on the evaluation set.

Speech Recognition

Wav2vec2 Base Timit Asr

A speech recognition model fine-tuned on the timit_asr dataset based on facebook/wav2vec2-base, supporting 16kHz sampled audio input

Speech Recognition

Transformers English

Wav2vec2 2 Bert Large

Automatic Speech Recognition (ASR) model trained on LibriSpeech dataset for converting English speech to text

Speech Recognition

Wavlm Base Libri Clean 100

Automatic speech recognition model based on the WavLM architecture, fine-tuned on the LibriSpeech CLEAN dataset (100 hours)

Speech Recognition

anjulRajendraSharma

Wav2vec2 Large Lv60 Timit

A speech recognition model fine-tuned on the TIMIT dataset based on facebook/wav2vec2-large-lv60, supporting 16kHz sampled speech input.

Speech Recognition English

Xlsr Wav2vec English

An automatic speech recognition model fine-tuned on the Common Voice dataset for English, based on facebook/wav2vec2-large, supporting 16kHz sampled audio input.

Speech Recognition

Transformers English

Wav2vec2 2 Roberta Large No Adapter Frozen Enc

This model is a speech recognition model trained on the LibriSpeech ASR dataset, capable of converting speech to text.

Speech Recognition

Wav2vec2 Base 100h

Wav2Vec2 base version speech recognition model trained on 100 hours of LibriSpeech data

Speech Recognition

Transformers English

An English automatic speech recognition (ASR) model fine-tuned based on microsoft/wavlm-base, trained on the english_ASR - CLEAN dataset with a word error rate (WER) of 0.0773.

Speech Recognition

anjulRajendraSharma

Wav2vec2 Librispeech Clean 100h Demo Dist

A speech recognition model fine-tuned on the LIBRISPEECH_ASR-CLEAN dataset based on facebook/wav2vec2-large-lv60

Speech Recognition

patrickvonplaten

An English fine-tuned speech recognition model based on facebook/wav2vec2-large, using the Common Voice dataset, supporting 16kHz sampled audio input.

Speech Recognition

Unispeech Large 1500h Cv Timit

This model is an automatic speech recognition model fine-tuned on the TIMIT_ASR dataset based on microsoft/unispeech-large-1500h-cv, achieving a word error rate (WER) of 21.96% on the evaluation set.

Speech Recognition

patrickvonplaten

Wav2vec2 Xls R 300m English

XLS-R-300M is an English automatic speech recognition model fine-tuned on the librispeech_asr dataset based on facebook/wav2vec2-xls-r-300m, achieving a word error rate of 12.29% on the LibriSpeech test set.

Speech Recognition

Transformers English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase