Model Selection

Unsupervised pretraining

# Unsupervised pretraining

Depth Anything V2 Small

Depth Anything V2 is currently the most powerful monocular depth estimation model, trained on large-scale synthetic and real images. Compared to V1, it captures finer details and is more robust.

3D Vision English

Viwav2vec2 Base 1.5k

This model is pretrained on 1.5k hours of Vietnamese speech data, suitable for Vietnamese speech recognition tasks, requires fine-tuning before use.

Speech Recognition

Transformers Other

Wav2vec2 Base 10k Voxpopuli

A foundational speech recognition model pretrained on 10,000 hours of unlabeled data from the VoxPopuli corpus, supporting multilingual speech processing

Speech Recognition

Transformers Other

Wav2vec2 Base Sl Voxpopuli V2

This is a speech model based on Facebook's Wav2Vec2 architecture, specifically pretrained for Slovenian (sl) using 11.3k hours of unlabeled data from the VoxPopuli corpus.

Speech Recognition

Transformers Other

Wav2vec2 Base Pl Voxpopuli V2

Polish Wav2Vec2 base model trained on VoxPopuli corpus, suitable for speech recognition tasks

Speech Recognition

Transformers Other

T5 Version 1.1 is Google's improved text-to-text conversion model, using the GEGLU activation function, pretrained unsupervised only on the C4 dataset, and requires fine-tuning for use.

Large Language Model English

Wav2vec2 Base Pt Voxpopuli V2

Wav2Vec2 base model pretrained on Portuguese VoxPopuli corpus, suitable for speech recognition tasks

Speech Recognition

Transformers Other

Wav2vec2 Large Romance Voxpopuli V2

Facebook's Wav2Vec2 large model, pretrained only on 101.5 hours of unlabeled data from the Romance language VoxPopuli corpus, suitable for speech recognition tasks.

Speech Recognition

Wav2vec2 Large Mt Voxpopuli V2

Facebook's Wav2Vec2 large model, pretrained exclusively on unlabeled data from the VoxPopuli corpus for Maltese (mt), suitable for speech recognition tasks.

Speech Recognition

Transformers Other

mGPT is a multilingual generation model pretrained on the mC4 dataset, supporting 101 languages, using a Transformer architecture similar to GPT-2.

Large Language Model

Wav2vec2 Base Sv Voxpopuli

A Wav2Vec2 base model pretrained on the Swedish subset of the VoxPopuli corpus, suitable for Swedish speech recognition tasks.

Speech Recognition

Transformers Other

Wav2vec2 Base Sk Voxpopuli V2

Wav2Vec2 base model pretrained on Slovak data from the VoxPopuli corpus, suitable for speech recognition tasks.

Speech Recognition

Transformers Other

mT5 is a multilingual variant of the T5 model, pretrained on the mC4 corpus covering 101 languages, suitable for multilingual text processing tasks.

Large Language Model Supports Multiple Languages

Wav2vec2 Base Et Voxpopuli V2

A speech model based on Facebook's Wav2Vec2 framework, specifically pretrained for Estonian

Speech Recognition

Transformers Other

Wav2vec2 Base Cs Voxpopuli V2

Wav2Vec2 base model pretrained on the VoxPopuli corpus, specialized for Czech speech processing

Speech Recognition

Transformers Other

Wav2vec2 Base It Voxpopuli

Wav2Vec2 base model pretrained on unlabeled Italian data from VoxPopuli, suitable for speech recognition tasks.

Speech Recognition

Transformers Other

Wav2vec2 Base De Voxpopuli V2

A German speech pretrained model based on Facebook's Wav2Vec2 architecture, pretrained using 23.2k unlabeled German data from the VoxPopuli corpus.

Speech Recognition

Transformers German

Wav2vec2 Base Nl Voxpopuli V2

A speech model based on Facebook's Wav2Vec2 architecture, specifically pretrained for Dutch using 19.0k unlabeled data from the VoxPopuli corpus.

Speech Recognition

Transformers Other

Wav2vec2 Large Nl Voxpopuli

Automatic speech recognition model pre-trained on the Dutch subset of the VoxPopuli corpus

Speech Recognition Other

Wav2vec2 Large It Voxpopuli

A speech recognition model pre-trained on unlabeled Italian data from VoxPopuli, using Facebook's Wav2Vec2 architecture

Speech Recognition Other

Wav2vec2 Base Lt Voxpopuli V2

This is a speech model based on Facebook's Wav2Vec2 architecture, specifically pretrained for Lithuanian using 14.4k unlabeled data from the VoxPopuli corpus.

Speech Recognition

Transformers Other

Wav2vec2 Base Hu Voxpopuli V2

A speech pretraining model based on Facebook's Wav2Vec2 architecture, pretrained on Hungarian data from the VoxPopuli corpus

Speech Recognition

Transformers Other

Wav2vec2 Base Bg Voxpopuli V2

A speech model based on Facebook's Wav2Vec2 architecture, specifically pretrained for Bulgarian language, suitable for speech recognition tasks.

Speech Recognition

Transformers Other

Wav2vec2 Base Lv Voxpopuli V2

A foundational speech recognition model based on Facebook's Wav2Vec2 architecture, specifically pretrained for Latvian (lv) using 13.1k hours of unlabeled data from the VoxPopuli corpus.

Speech Recognition

Transformers Other

Wav2vec2 Base Fr Voxpopuli

Wav2Vec2 base model pre-trained on unannotated French data from VoxPopuli, suitable for French speech recognition tasks

Speech Recognition

Transformers French

mT5 is a multilingual text-to-text transformation model launched by Google, supporting 101 languages, pretrained on the mC4 dataset, and suitable for various NLP tasks.

Large Language Model

Transformers Supports Multiple Languages

Wav2vec2 Base 100k Voxpopuli

A speech recognition base model pretrained on 100,000 hours of unannotated data from the VoxPopuli corpus

Speech Recognition

Transformers Other

Wav2vec2 Base Es Voxpopuli V2

Wav2Vec2 base model, pretrained on 21.4k hours of unlabeled Spanish data, suitable for speech recognition tasks.

Speech Recognition

Transformers Spanish

Wav2vec2 Large West Germanic Voxpopuli V2

Facebook's Wav2Vec2 large model, pretrained exclusively on 66.3 hours of unlabeled data from the West Germanic VoxPopuli corpus.

Speech Recognition

Wav2vec2 Large El Voxpopuli V2

Greek speech recognition model pretrained on VoxPopuli corpus using 17.7 hours of unlabeled data

Speech Recognition

Transformers Other

mT5 is the multilingual version of the T5 model, supporting 101 languages, pretrained on the mC4 corpus, and suitable for various natural language processing tasks.

Large Language Model Supports Multiple Languages

Legal T5 Small Multitask Sv En

A multitask learning model for Swedish legal text to English translation, combining supervised translation tasks and unsupervised masked language model tasks

Machine Translation

Legal T5 Small Trans Cs En Small Finetuned

Small T5 model for Czech legal text to English translation, based on 60 million parameter architecture

Machine Translation

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase