# English Speech Recognition
Parakeet Tdt 0.6b V2 Onnx
NVIDIA Parakeet TDT 0.6B V2 is a model based on automatic speech recognition (ASR) tasks, suitable for English speech-to-text tasks.
Speech Recognition English
P
istupakov
129
3
Whisper Custom Small
Apache-2.0
A small speech recognition model based on the OpenAI Whisper architecture, focused on English speech-to-text tasks.
Speech Recognition English
W
gyrroa
15
1
Wav2vec2 Tellmate
Apache-2.0
A speech recognition model optimized for chess coordinate recognition, fine-tuned using nearly 2,500 English chess coordinate audio files
Speech Recognition
Transformers Supports Multiple Languages

W
leomino
27
1
Moonshine Base
MIT
Moonshine is a series of automatic speech recognition (ASR) models developed by Useful Sensors, specifically designed for English speech transcription, excelling on resource-constrained platforms.
Speech Recognition
Transformers English

M
UsefulSensors
6,857
32
Whisper Base.en
Whisper is a general-purpose speech recognition model trained by OpenAI. This model is based on large-scale weakly supervised training and supports speech transcription in multiple languages.
Speech Recognition
Transformers

W
onnx-community
76
1
Whisper Medicalv1
MIT
Distil-Whisper is a knowledge-distilled version of Whisper large-v3, focusing on English speech recognition, offering faster inference speeds while maintaining accuracy close to the original model.
Speech Recognition English
W
Crystalcareai
348
11
Wav2vec2 Bert CV16 En
An automatic speech recognition (ASR) model fine-tuned on the Common Voice 16.0 English dataset based on w2v-bert-2.0
Speech Recognition
Transformers English

W
hf-audio
1,700
8
Distil Small.en
MIT
Distil-Whisper is a distilled version of the Whisper model, 6x faster with 49% smaller size, achieving near 1% WER on out-of-distribution evaluation sets.
Speech Recognition
Transformers English

D
distil-whisper
33.51k
97
Faster Whisper Small.en
MIT
CTranslate2 converted version of OpenAI Whisper small.en model for efficient speech recognition
Speech Recognition English
F
Systran
129.26k
4
Distil Medium.en
MIT
Distil-Whisper is a distilled version of the Whisper model, 6 times faster than the original, with a 49% reduction in size, while maintaining performance close to the original in English speech recognition tasks.
Speech Recognition English
D
distil-whisper
186.85k
120
Distil Large V2
MIT
Distil-Whisper is a distilled version of the Whisper model, achieving 6x speedup and 49% size reduction with only a 1% WER difference on out-of-distribution evaluation sets.
Speech Recognition English
D
distil-whisper
42.65k
508
Wav2vec2 Base 960h
ONNX format conversion of Facebook's wav2vec2-base-960h model, designed for Transformers.js, supporting browser-side speech recognition
Speech Recognition
Transformers

W
Xenova
117
3
Wav2vec2 Large Xlsr 53 English
Large-scale speech recognition model based on the wav2vec 2.0 architecture, supporting English speech-to-text conversion
Speech Recognition
Transformers

W
Xenova
14
2
Exp W2v2t En Unispeech Sat S459
Apache-2.0
An English speech recognition model fine-tuned based on Microsoft's UniSpeech-SAT-Large model, supporting 16kHz sampled audio input.
Speech Recognition
Transformers English

E
jonatasgrosman
22
0
Wav2vec2 Large Xlsr 53 Enlgish FT ASCEND Colab
Apache-2.0
This model is a fine-tuned speech recognition model based on jonatasgrosman/wav2vec2-large-xlsr-53-english on the ascend dataset.
Speech Recognition
Transformers

W
Ryna
16
0
Assignment1 Omar
Apache-2.0
Wav2Vec2 is a self-supervised learning-based speech recognition model, pre-trained and fine-tuned on 960 hours of LibriSpeech audio data, supporting English speech transcription.
Speech Recognition
Transformers English

A
Classroom-workshop
28
0
Xtreme S Xlsr 300m Voxpopuli En
Apache-2.0
This model is a fine-tuned speech recognition model based on facebook/wav2vec2-xls-r-300m on the GOOGLE/XTREME_S - VOXPOPULI.EN dataset, supporting English speech-to-text tasks.
Speech Recognition
Transformers English

X
anton-l
28
0
Ascend With English
English speech recognition model fine-tuned on the timit_asr dataset for the ascend model
Speech Recognition
Transformers

A
GleamEyeBeast
23
0
Wav2vec2 2 Gpt2 Regularisation
This is an automatic speech recognition (ASR) model trained on the LibriSpeech dataset, capable of converting English speech into text.
Speech Recognition
Transformers

W
sanchit-gandhi
20
0
Speech Text
Apache-2.0
An automatic speech recognition model fine-tuned on the English Common Voice dataset based on facebook/wav2vec2-large-xlsr-53, supporting English speech input at 16kHz sampling rate.
Speech Recognition English
S
abidlabs
25
0
Wav2vec2 Base Superb Sv
Apache-2.0
This is a speaker verification model based on the Wav2Vec2 architecture, specifically designed for the speaker verification task in the SUPERB benchmark.
Speaker Analysis
Transformers English

W
anton-l
901
3
English ASR
Apache-2.0
This model is a fine-tuned English Automatic Speech Recognition (ASR) model based on facebook/wav2vec2-base, achieving a word error rate of 0.3397 on the evaluation set.
Speech Recognition
Transformers

E
maher13
13
0
Wav2vec2 Base Timit Asr
Apache-2.0
A speech recognition model fine-tuned on the timit_asr dataset based on facebook/wav2vec2-base, supporting 16kHz sampled audio input
Speech Recognition
Transformers English

W
elgeish
174
0
Wav2vec2 2 Bert Large
Automatic Speech Recognition (ASR) model trained on LibriSpeech dataset for converting English speech to text
Speech Recognition
Transformers

W
speech-seq2seq
17
0
Wavlm Base Libri Clean 100
Automatic speech recognition model based on the WavLM architecture, fine-tuned on the LibriSpeech CLEAN dataset (100 hours)
Speech Recognition
Transformers

W
anjulRajendraSharma
73
0
Wav2vec2 Large Lv60 Timit
Apache-2.0
A speech recognition model fine-tuned on the TIMIT dataset based on facebook/wav2vec2-large-lv60, supporting 16kHz sampled speech input.
Speech Recognition English
W
harshit345
21
1
Xlsr Wav2vec English
Apache-2.0
An automatic speech recognition model fine-tuned on the Common Voice dataset for English, based on facebook/wav2vec2-large, supporting 16kHz sampled audio input.
Speech Recognition
Transformers English

X
harshit345
27
0
Wav2vec2 2 Roberta Large No Adapter Frozen Enc
This model is a speech recognition model trained on the LibriSpeech ASR dataset, capable of converting speech to text.
Speech Recognition
Transformers

W
speech-seq2seq
27
0
Wav2vec2 Base 100h
Apache-2.0
Wav2Vec2 base version speech recognition model trained on 100 hours of LibriSpeech data
Speech Recognition
Transformers English

W
vuiseng9
26
0
Wavlm Base En
An English automatic speech recognition (ASR) model fine-tuned based on microsoft/wavlm-base, trained on the english_ASR - CLEAN dataset with a word error rate (WER) of 0.0773.
Speech Recognition
Transformers

W
anjulRajendraSharma
17
0
Wav2vec2 Librispeech Clean 100h Demo Dist
Apache-2.0
A speech recognition model fine-tuned on the LIBRISPEECH_ASR-CLEAN dataset based on facebook/wav2vec2-large-lv60
Speech Recognition
Transformers

W
patrickvonplaten
15
0
English Model
An English fine-tuned speech recognition model based on facebook/wav2vec2-large, using the Common Voice dataset, supporting 16kHz sampled audio input.
Speech Recognition
Transformers

E
tanmayplanet32
30
0
Unispeech Large 1500h Cv Timit
This model is an automatic speech recognition model fine-tuned on the TIMIT_ASR dataset based on microsoft/unispeech-large-1500h-cv, achieving a word error rate (WER) of 21.96% on the evaluation set.
Speech Recognition
Transformers

U
patrickvonplaten
536
0
Wav2vec2 Xls R 300m English
Apache-2.0
XLS-R-300M is an English automatic speech recognition model fine-tuned on the librispeech_asr dataset based on facebook/wav2vec2-xls-r-300m, achieving a word error rate of 12.29% on the LibriSpeech test set.
Speech Recognition
Transformers English

W
vitouphy
21
3
Featured Recommended AI Models