# Speech-to-Text

Whisper Large V3 Turbo
An ONNX-optimized Whisper large speech recognition model designed for web deployment
Speech Recognition Transformers
W
onnx-community
2,988
54
W2V2 BERT Withlm Malayalam
MIT
A Malayalam automatic speech recognition model fine-tuned based on facebook/w2v-bert-2.0, trained on multiple Malayalam datasets and using a trigram language model trained with the KENLM library.
Speech Recognition Transformers Other
W
vrclc
65
3
Whisper Base
Whisper is an automatic speech recognition (ASR) system trained by OpenAI, supporting multilingual speech transcription.
Speech Recognition Transformers
W
onnx-community
5,704
19
WHISPER SMALL SWAHILI ASR CV 14
Apache-2.0
This model is a fine-tuned speech recognition model based on OpenAI's Whisper large on the Common Voice 14.0 Swahili (SW) dataset, achieving a word error rate (WER) of 25.13%.
Speech Recognition Transformers Other
W
dmusingu
28
2
Faster Distil Whisper Large V3
MIT
Distilled version of Whisper Large v3 for efficient automatic speech recognition (ASR)
Speech Recognition English
F
Systran
18.55k
49
Whisper Tiny
Apache-2.0
This is a version converted from the GGML format of openai/whisper-tiny to Ratchet's custom format
Speech Recognition
W
FL33TW00D-HF
17.21k
5
Audiosangraha Audio To Text
Apache-2.0
A speech-to-text model fine-tuned based on openai/whisper-small, supporting audio translation and text generation tasks.
Speech Recognition Transformers
A
AqeelShafy7
224
4
Whisper Small Ml
Apache-2.0
This model is a fine-tuned version of openai/whisper-small for speech recognition, supporting multiple languages and suitable for automatic speech recognition tasks.
Speech Recognition Transformers
W
kavyamanohar
23
2
Speecht5 Tts Marathi
This is a model for Marathi speech processing, potentially involving speech recognition or speech synthesis tasks.
Speech Recognition Transformers
S
Patil
26
0
Whisper Medium
Whisper Medium is a medium-scale speech recognition model developed by OpenAI, supporting automatic speech recognition (ASR) tasks in multiple languages.
Speech Recognition Transformers
W
Xenova
871
4
Whisper Small
Whisper Small is a small automatic speech recognition (ASR) model developed by OpenAI, capable of converting speech into text.
Speech Recognition Transformers
W
Xenova
1,716
9
Whisper Base
Whisper is an automatic speech recognition (ASR) system trained by OpenAI, supporting speech-to-text tasks in multiple languages.
Speech Recognition Transformers
W
Xenova
6,204
7
Whisper Tiny
Whisper Tiny is a lightweight speech recognition model open-sourced by OpenAI, suitable for web deployment.
Speech Recognition Transformers
W
Xenova
21.70k
8
Speecht5 Asr
MIT
A SpeechT5 automatic speech recognition model fine-tuned on the LibriSpeech dataset, supporting speech-to-text conversion.
Speech Recognition Transformers
S
microsoft
12.30k
41
Whisper Base
Apache-2.0
Whisper is a pre-trained automatic speech recognition (ASR) and speech translation model, trained on 680k hours of labeled data with strong generalization capabilities.
Speech Recognition Supports Multiple Languages
W
openai
491.35k
216
Wav2vec2 Xls R 300m Mrbrown Finetune1
Apache-2.0
A speech recognition model fine-tuned using the uob_singlish dataset based on the facebook/wav2vec2-xls-r-300m pre-trained model
Speech Recognition Transformers
W
RuiqianLi
18
0
Wav2vec2 Large Xls R 300m Turkish Colab
Apache-2.0
This model is a Turkish speech recognition model fine-tuned on the Common Voice Turkish dataset based on facebook/wav2vec2-xls-r-300m, achieving a word error rate of 30.95% on the evaluation set.
Speech Recognition Transformers
W
dennisowusuk
15
0
Wav2vec2 Large 960h
Apache-2.0
Wav2Vec2 is a speech recognition model developed by Facebook. It learns speech representations from raw audio through self-supervised learning and is fine-tuned on the LibriSpeech dataset to achieve high-accuracy speech transcription.
Speech Recognition Transformers English
W
facebook
77.59k
29
Wav2vec2 2 Bart Base
A speech recognition model fine-tuned on the LibriSpeech ASR clean dataset, based on wav2vec2-base and bart-base
Speech Recognition Transformers
W
patrickvonplaten
493
5
Bp Cetuc100 Xlsr
Apache-2.0
Wav2vec2 model fine-tuned for Brazilian Portuguese using the CETUC dataset, trained with approximately 145 hours of Brazilian Portuguese speech data
Speech Recognition Transformers Other
B
lgris
22
0
Asr Hubert Cluster Bart Base
Apache-2.0
An automatic speech recognition model based on Hubert and BART architecture, achieving speech-to-text conversion through clustered feature transformation
Speech Recognition Transformers Supports Multiple Languages
A
voidful
13
0
Wav2vec2 Large Xls R 300m Ar
Apache-2.0
A speech recognition model fine-tuned on the Common Voice Arabic dataset based on facebook/wav2vec2-xls-r-300m
Speech Recognition Transformers
W
ayameRushia
18
0
Wav2vec2 Tiny Random
A lightweight randomly initialized Wav2Vec2 model for speech recognition, primarily for testing and development purposes
Speech Recognition Transformers
W
patrickvonplaten
2,988
1
Wav2vec Osr
Apache-2.0
A fine-tuned Facebook wav2vec2 model for the speech-to-text module of The Sound of AI Open Source Research Group
Speech Recognition Transformers English
W
iamtarun
22
1
Wav2vec2 Xls R 300m Kh
This is a baseline model for Khmer automatic speech recognition (ASR), designed to provide foundational support for Khmer speech recognition tasks.
Speech Recognition Transformers
W
kongkeaouch
21
0
Wav2vec2 2 Bart Large
This model is an automatic speech recognition (ASR) model fine-tuned on the librispeech_asr-clean dataset, based on wav2vec2-large-lv60 and bart-large
Speech Recognition Transformers
W
patrickvonplaten
31
5
Wav2vec2 Large 100k Voxpopuli Ft Common Voice Plus TTS Dataset Russian
Apache-2.0
This is a speech recognition model based on Facebook's wav2vec2-large-100k-voxpopuli, fine-tuned using Common Voice 7.0 and M-AILABS Russian data.
Speech Recognition Transformers Other
W
Edresson
25
6
Waynehills STT Doogie Server
Apache-2.0
A fine-tuned speech recognition model based on Doogie/Waynehills-STT-doogie-server
Speech Recognition Transformers
W
Waynehillsdev
28
0
Xls R 300m Ur Cv8 Hi
Apache-2.0
This is an Urdu automatic speech recognition model based on the wav2vec2 architecture, fine-tuned on the Common Voice 8.0 Urdu dataset
Speech Recognition Transformers Other
X
HarrisDePerceptron
16
0
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase