# High-precision ASR

Quantum STT
Apache-2.0
Quantum_STT is an advanced automatic speech recognition (ASR) and speech translation model, trained with large-scale weak supervision, supporting multiple languages and tasks.
Speech Recognition Transformers Supports Multiple Languages
Q
sbapan41
100
1
Gigaam Rnnt
MIT
GigaAM-v2-RNNT is a Russian automatic speech recognition (ASR) model based on the RNNT architecture, suitable for speech-to-text tasks.
Speech Recognition Transformers Other
G
waveletdeboshir
70
1
Whisper Large V3 Persian Common Voice 17
Apache-2.0
A Persian automatic speech recognition model fine-tuned based on Whisper Large v3, trained on the Common Voice 17 dataset, which contains over 250,000 Persian audio samples, significantly improving recognition accuracy and robustness.
Speech Recognition Transformers
W
MohammadGholizadeh
978
3
Canary 1b Flash
NVIDIA NeMo Canary Flash is a family of multilingual multitask models that achieves state-of-the-art performance across multiple speech benchmarks. Supports automatic speech recognition and translation tasks in four languages.
Speech Recognition Supports Multiple Languages
C
nvidia
125.22k
186
Phi 4 Multimodal Instruct Ko Asr
A Korean automatic speech recognition (ASR) and speech translation (AST) model fine-tuned based on microsoft/Phi-4-multimodal-instruct, demonstrating excellent performance on the zeroth-korean and fleurs datasets.
Text-to-Audio Transformers Korean
P
junnei
354
3
Whisper Large V3
Apache-2.0
A fine-tuned version of OpenAI Whisper Large v3 model specifically for Hebrew language audio transcription tasks
Speech Recognition Transformers Other
W
ivrit-ai
2,068
3
Artst Asr V3 Qasr
MIT
An Arabic automatic speech recognition model fine-tuned on the QASR dataset, specifically adapted for dialectal variants
Speech Recognition Transformers Supports Multiple Languages
A
MBZUAI
636
1
Vi Whisper Large V3 Turbo V1
Whisper-V3-Turbo model optimized for Vietnamese automatic speech recognition (ASR) tasks, fine-tuned using multiple Vietnamese datasets
Speech Recognition Transformers Other
V
suzii
182
7
Asr Streaming Conformer Gigaspeech
Apache-2.0
An English automatic speech recognition model pre-trained on the GigaSpeech dataset, supporting both streaming and non-streaming transcription
Speech Recognition English
A
speechbrain
66
4
Parakeet Tdt Ctc 110m
An English speech recognition model jointly developed by NVIDIA NeMo and Suno.ai, supporting punctuation and capitalization conversion, based on FastConformer-TDT-CTC architecture
Speech Recognition English
P
nvidia
50.47k
28
Whisper Large V3 Ca 3catparla
Apache-2.0
This is an automatic speech recognition model optimized for Catalan, fine-tuned based on OpenAI's Whisper-large-v3 and developed by the Barcelona Supercomputing Center.
Speech Recognition Transformers Other
W
projecte-aina
122
4
Parakeet Tdt Ctc 0.6b Ja
Parakeet TDT-CTC 0.6B is an automatic speech recognition (ASR) model capable of transcribing Japanese speech with punctuation, developed by the NVIDIA NeMo team.
Speech Recognition Japanese
P
nvidia
4,184
22
Asr Streaming Conformer Librispeech
Apache-2.0
This is an end-to-end automatic speech recognition system pre-trained on the LibriSpeech dataset, supporting both streaming and non-streaming modes, suitable for English speech recognition.
Speech Recognition English
A
speechbrain
304
10
Canary 1b
Canary-1B is a multilingual multi-task model developed by NVIDIA NeMo, supporting automatic speech recognition and speech translation tasks in English, German, French, and Spanish.
Speech Recognition Supports Multiple Languages
C
nvidia
7,734
421
Nb Whisper Large Verbatim
Apache-2.0
Norwegian automatic speech recognition model developed based on OpenAI Whisper, with additional training for lowercase, punctuation-free verbatim transcription
Speech Recognition Supports Multiple Languages
N
NbAiLabBeta
765
2
Whisper Large V3
Apache-2.0
Whisper is an advanced automatic speech recognition (ASR) and speech translation model proposed by OpenAI, trained on over 5 million hours of labeled data, with strong cross-dataset and cross-domain generalization capabilities.
Speech Recognition Supports Multiple Languages
W
openai
4.6M
4,321
Stt Ua Fastconformer Hybrid Large Pc
NVIDIA FastConformer-Hybrid Large (ua) is a hybrid model for Ukrainian speech recognition, which combines the training of two loss functions, Transducer and CTC, with approximately 115 million parameters.
Speech Recognition
S
nvidia
381
4
Speecht5 Asr
MIT
A SpeechT5 automatic speech recognition model fine-tuned on the LibriSpeech dataset, supporting speech-to-text conversion.
Speech Recognition Transformers
S
microsoft
12.30k
41
Whisper Large V2 Mn 13
Apache-2.0
A Mongolian speech recognition model fine-tuned on Mongolian datasets based on OpenAI's whisper-large-v2 model, supporting automatic speech recognition tasks in Mongolian.
Speech Recognition Transformers Other
W
bayartsogt
161
6
Whisper Th Medium Combined
Apache-2.0
Fine-tuned on an enhanced Thai dataset based on openai/whisper-medium for Thai automatic speech recognition
Speech Recognition Transformers
W
biodatlab
4,167
17
Whisper Medium Ko Zeroth
Apache-2.0
A speech recognition model fine-tuned on the Zeroth Korean dataset based on OpenAI Whisper Medium model, with a word error rate of 3.64%
Speech Recognition Transformers Korean
W
seastar105
154
16
Exp W2v2t Zh Cn Wavlm S596
Apache-2.0
A Chinese speech recognition model fine-tuned based on microsoft/wavlm-large, supporting Simplified Chinese, trained using the Common Voice 7.0 (zh-CN) dataset.
Speech Recognition Transformers
E
jonatasgrosman
22
1
Exp W2v2t It Vp 100k S449
Apache-2.0
An Italian automatic speech recognition model fine-tuned from the facebook/wav2vec2-large-100k-voxpopuli model, trained using the Common Voice 7.0 Italian dataset.
Speech Recognition Transformers Other
E
jonatasgrosman
17
0
Exp W2v2t It Wav2vec2 S609
Apache-2.0
An Italian automatic speech recognition model fine-tuned based on facebook/wav2vec2-large-lv60, trained using the Common Voice 7.0 Italian dataset.
Speech Recognition Transformers Other
E
jonatasgrosman
18
0
Exp W2v2t Ja Vp It S544
Apache-2.0
A Japanese automatic speech recognition model fine-tuned using the training set of Common Voice 7.0 (Japanese version), based on the facebook/wav2vec2-large-it-voxpopuli model.
Speech Recognition Transformers Japanese
E
jonatasgrosman
18
0
Ai Light Dance Singing2 Ft Wav2vec2 Large Xlsr 53 V1
Apache-2.0
This model is an automatic speech recognition model fine-tuned on the GARY109/AI_LIGHT_DANCE - ONSET-SINGING2 dataset based on wav2vec2-large-xlsr-53, primarily used for singing voice recognition tasks.
Speech Recognition Transformers
A
gary109
185
0
First Model
Apache-2.0
This model is a fine-tuned speech recognition model based on facebook/wav2vec2-xls-r-300m on the common_voice dataset, achieving a low word error rate on the evaluation set.
Speech Recognition Transformers
F
Vkt
26
0
Wav2vec2 Large Xlsr 53 Dutch
Apache-2.0
A Dutch automatic speech recognition (ASR) model developed by Facebook based on the Wav2Vec 2.0 architecture, fine-tuned using the XLSR-53 multilingual pretrained model
Speech Recognition Other
W
facebook
203
2
Wav2vec2 Large Xls R 300m Maltese
Apache-2.0
This is an automatic speech recognition (ASR) model fine-tuned on Maltese speech datasets based on the facebook/wav2vec2-xls-r-300m model.
Speech Recognition Transformers Other
W
infinitejoy
19
0
Wavlm Base Libri Clean 100
Automatic speech recognition model based on the WavLM architecture, fine-tuned on the LibriSpeech CLEAN dataset (100 hours)
Speech Recognition Transformers
W
anjulRajendraSharma
73
0
Wav2vec2 Large Xlsr 53 Italian
Apache-2.0
Large-scale Italian automatic speech recognition model based on the Wav2Vec2 architecture, fine-tuned on the Common Voice dataset, released by Facebook
Speech Recognition Other
W
facebook
4,013
6
Wav2vec2 Xls R 300m Bangla Command
Apache-2.0
This is a 300M-parameter Bengali speech recognition model based on the wav2vec2 XLS-R architecture, specifically optimized for command recognition tasks.
Speech Recognition Transformers Other
W
sshasnain
28
2
Wav2vec2 Base 10k Voxpopuli Ft Cs
A speech recognition model based on Facebook's Wav2Vec2 architecture, pre-trained with 10K unlabeled Czech data from the VoxPopuli corpus and fine-tuned on Czech transcription data.
Speech Recognition Transformers Other
W
facebook
226
0
Wav2vec2 Large Xlsr Welsh
Apache-2.0
An automatic speech recognition model fine-tuned on the Welsh Common Voice dataset based on facebook/wav2vec2-large-xlsr-53, achieving a test WER of 29.4%.
Speech Recognition Other
W
Srulikbdd
386
0
Xls Asr Vi 40h 1B
Apache-2.0
Vietnamese automatic speech recognition model fine-tuned on 40 hours of FPT Open Speech Dataset (FOSD) and Common Voice 7.0 dataset based on facebook/wav2vec2-xls-r-1b
Speech Recognition Transformers Other
X
geninhu
23
0
Wavlm Base En
An English automatic speech recognition (ASR) model fine-tuned based on microsoft/wavlm-base, trained on the english_ASR - CLEAN dataset with a word error rate (WER) of 0.0773.
Speech Recognition Transformers
W
anjulRajendraSharma
17
0
Wav2vec2 Base 10k Voxpopuli Ft Fr
A speech recognition model based on Facebook's Wav2Vec2 architecture, pretrained on 10K unlabeled French data from the VoxPopuli corpus and fine-tuned on French transcription data.
Speech Recognition Transformers French
W
facebook
75
0
Output
Apache-2.0
This model is an automatic speech recognition (ASR) model fine-tuned on the Turkish COMMON_VOICE dataset based on cahya/wav2vec2-base-turkish-artificial-cv
Speech Recognition Transformers Other
O
cahya
23
0
Wav2vec2 Large Xlsr 53 Portuguese
Apache-2.0
A large-scale Portuguese automatic speech recognition (ASR) model developed by Facebook based on the Wav2Vec 2.0 architecture, supporting Portuguese speech-to-text tasks.
Speech Recognition Other
W
facebook
425
6
Wav2vec2 Large Xlsr 53 Polish
Apache-2.0
Polish automatic speech recognition model developed by Facebook, based on Wav2Vec2 architecture and XLSR-53 multilingual pretrained model
Speech Recognition Other
W
facebook
174
3
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase