Model Selection

High-precision ASR

# High-precision ASR

Quantum_STT is an advanced automatic speech recognition (ASR) and speech translation model, trained with large-scale weak supervision, supporting multiple languages and tasks.

Speech Recognition

Transformers Supports Multiple Languages

GigaAM-v2-RNNT is a Russian automatic speech recognition (ASR) model based on the RNNT architecture, suitable for speech-to-text tasks.

Speech Recognition

Transformers Other

waveletdeboshir

Whisper Large V3 Persian Common Voice 17

A Persian automatic speech recognition model fine-tuned based on Whisper Large v3, trained on the Common Voice 17 dataset, which contains over 250,000 Persian audio samples, significantly improving recognition accuracy and robustness.

Speech Recognition

MohammadGholizadeh

Canary 1b Flash

NVIDIA NeMo Canary Flash is a family of multilingual multitask models that achieves state-of-the-art performance across multiple speech benchmarks. Supports automatic speech recognition and translation tasks in four languages.

Speech Recognition Supports Multiple Languages

Phi 4 Multimodal Instruct Ko Asr

A Korean automatic speech recognition (ASR) and speech translation (AST) model fine-tuned based on microsoft/Phi-4-multimodal-instruct, demonstrating excellent performance on the zeroth-korean and fleurs datasets.

Transformers Korean

Whisper Large V3

A fine-tuned version of OpenAI Whisper Large v3 model specifically for Hebrew language audio transcription tasks

Speech Recognition

Transformers Other

Artst Asr V3 Qasr

An Arabic automatic speech recognition model fine-tuned on the QASR dataset, specifically adapted for dialectal variants

Speech Recognition

Transformers Supports Multiple Languages

Vi Whisper Large V3 Turbo V1

Whisper-V3-Turbo model optimized for Vietnamese automatic speech recognition (ASR) tasks, fine-tuned using multiple Vietnamese datasets

Speech Recognition

Transformers Other

Asr Streaming Conformer Gigaspeech

An English automatic speech recognition model pre-trained on the GigaSpeech dataset, supporting both streaming and non-streaming transcription

Speech Recognition English

Parakeet Tdt Ctc 110m

An English speech recognition model jointly developed by NVIDIA NeMo and Suno.ai, supporting punctuation and capitalization conversion, based on FastConformer-TDT-CTC architecture

Speech Recognition English

Whisper Large V3 Ca 3catparla

This is an automatic speech recognition model optimized for Catalan, fine-tuned based on OpenAI's Whisper-large-v3 and developed by the Barcelona Supercomputing Center.

Speech Recognition

Transformers Other

Parakeet Tdt Ctc 0.6b Ja

Parakeet TDT-CTC 0.6B is an automatic speech recognition (ASR) model capable of transcribing Japanese speech with punctuation, developed by the NVIDIA NeMo team.

Speech Recognition Japanese

Asr Streaming Conformer Librispeech

This is an end-to-end automatic speech recognition system pre-trained on the LibriSpeech dataset, supporting both streaming and non-streaming modes, suitable for English speech recognition.

Speech Recognition English

Canary-1B is a multilingual multi-task model developed by NVIDIA NeMo, supporting automatic speech recognition and speech translation tasks in English, German, French, and Spanish.

Speech Recognition Supports Multiple Languages

Nb Whisper Large Verbatim

Norwegian automatic speech recognition model developed based on OpenAI Whisper, with additional training for lowercase, punctuation-free verbatim transcription

Speech Recognition Supports Multiple Languages

Whisper Large V3

Whisper is an advanced automatic speech recognition (ASR) and speech translation model proposed by OpenAI, trained on over 5 million hours of labeled data, with strong cross-dataset and cross-domain generalization capabilities.

Speech Recognition Supports Multiple Languages

Stt Ua Fastconformer Hybrid Large Pc

NVIDIA FastConformer-Hybrid Large (ua) is a hybrid model for Ukrainian speech recognition, which combines the training of two loss functions, Transducer and CTC, with approximately 115 million parameters.

Speech Recognition

A SpeechT5 automatic speech recognition model fine-tuned on the LibriSpeech dataset, supporting speech-to-text conversion.

Speech Recognition

Whisper Large V2 Mn 13

A Mongolian speech recognition model fine-tuned on Mongolian datasets based on OpenAI's whisper-large-v2 model, supporting automatic speech recognition tasks in Mongolian.

Speech Recognition

Transformers Other

Whisper Th Medium Combined

Fine-tuned on an enhanced Thai dataset based on openai/whisper-medium for Thai automatic speech recognition

Speech Recognition

Whisper Medium Ko Zeroth

A speech recognition model fine-tuned on the Zeroth Korean dataset based on OpenAI Whisper Medium model, with a word error rate of 3.64%

Speech Recognition

Transformers Korean

Exp W2v2t Zh Cn Wavlm S596

A Chinese speech recognition model fine-tuned based on microsoft/wavlm-large, supporting Simplified Chinese, trained using the Common Voice 7.0 (zh-CN) dataset.

Speech Recognition

Exp W2v2t It Vp 100k S449

An Italian automatic speech recognition model fine-tuned from the facebook/wav2vec2-large-100k-voxpopuli model, trained using the Common Voice 7.0 Italian dataset.

Speech Recognition

Transformers Other

Exp W2v2t It Wav2vec2 S609

An Italian automatic speech recognition model fine-tuned based on facebook/wav2vec2-large-lv60, trained using the Common Voice 7.0 Italian dataset.

Speech Recognition

Transformers Other

Exp W2v2t Ja Vp It S544

A Japanese automatic speech recognition model fine-tuned using the training set of Common Voice 7.0 (Japanese version), based on the facebook/wav2vec2-large-it-voxpopuli model.

Speech Recognition

Transformers Japanese

Ai Light Dance Singing2 Ft Wav2vec2 Large Xlsr 53 V1

This model is an automatic speech recognition model fine-tuned on the GARY109/AI_LIGHT_DANCE - ONSET-SINGING2 dataset based on wav2vec2-large-xlsr-53, primarily used for singing voice recognition tasks.

Speech Recognition

This model is a fine-tuned speech recognition model based on facebook/wav2vec2-xls-r-300m on the common_voice dataset, achieving a low word error rate on the evaluation set.

Speech Recognition

Wav2vec2 Large Xlsr 53 Dutch

A Dutch automatic speech recognition (ASR) model developed by Facebook based on the Wav2Vec 2.0 architecture, fine-tuned using the XLSR-53 multilingual pretrained model

Speech Recognition Other

Wav2vec2 Large Xls R 300m Maltese

This is an automatic speech recognition (ASR) model fine-tuned on Maltese speech datasets based on the facebook/wav2vec2-xls-r-300m model.

Speech Recognition

Transformers Other

Wavlm Base Libri Clean 100

Automatic speech recognition model based on the WavLM architecture, fine-tuned on the LibriSpeech CLEAN dataset (100 hours)

Speech Recognition

anjulRajendraSharma

Wav2vec2 Large Xlsr 53 Italian

Large-scale Italian automatic speech recognition model based on the Wav2Vec2 architecture, fine-tuned on the Common Voice dataset, released by Facebook

Speech Recognition Other

Wav2vec2 Xls R 300m Bangla Command

This is a 300M-parameter Bengali speech recognition model based on the wav2vec2 XLS-R architecture, specifically optimized for command recognition tasks.

Speech Recognition

Transformers Other

Wav2vec2 Base 10k Voxpopuli Ft Cs

A speech recognition model based on Facebook's Wav2Vec2 architecture, pre-trained with 10K unlabeled Czech data from the VoxPopuli corpus and fine-tuned on Czech transcription data.

Speech Recognition

Transformers Other

Wav2vec2 Large Xlsr Welsh

An automatic speech recognition model fine-tuned on the Welsh Common Voice dataset based on facebook/wav2vec2-large-xlsr-53, achieving a test WER of 29.4%.

Speech Recognition Other

Xls Asr Vi 40h 1B

Vietnamese automatic speech recognition model fine-tuned on 40 hours of FPT Open Speech Dataset (FOSD) and Common Voice 7.0 dataset based on facebook/wav2vec2-xls-r-1b

Speech Recognition

Transformers Other

An English automatic speech recognition (ASR) model fine-tuned based on microsoft/wavlm-base, trained on the english_ASR - CLEAN dataset with a word error rate (WER) of 0.0773.

Speech Recognition

anjulRajendraSharma

Wav2vec2 Base 10k Voxpopuli Ft Fr

A speech recognition model based on Facebook's Wav2Vec2 architecture, pretrained on 10K unlabeled French data from the VoxPopuli corpus and fine-tuned on French transcription data.

Speech Recognition

Transformers French

This model is an automatic speech recognition (ASR) model fine-tuned on the Turkish COMMON_VOICE dataset based on cahya/wav2vec2-base-turkish-artificial-cv

Speech Recognition

Transformers Other

Wav2vec2 Large Xlsr 53 Portuguese

A large-scale Portuguese automatic speech recognition (ASR) model developed by Facebook based on the Wav2Vec 2.0 architecture, supporting Portuguese speech-to-text tasks.

Speech Recognition Other

Wav2vec2 Large Xlsr 53 Polish

Polish automatic speech recognition model developed by Facebook, based on Wav2Vec2 architecture and XLSR-53 multilingual pretrained model

Speech Recognition Other

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase