Access Global AI Models - Power Next-Gen Apps
From General to Specialized AI - All Models in One Platform
Hot
Latest
High Likes
Filter

23189 models match the criteria

Nsfw Image Detection
Apache-2.0
An NSFW image classification model based on the ViT architecture, pre-trained on ImageNet-21k via supervised learning and fine-tuned on 80,000 images to distinguish between normal and NSFW content.
Image Classification Transformers
N
Falconsai
82.4M
588
Fairface Age Image Detection
Apache-2.0
An image classification model based on Vision Transformer architecture, pre-trained on the ImageNet-21k dataset, suitable for multi-category image classification tasks
Image Classification Transformers
F
dima806
76.6M
10
Clip Vit Large Patch14
CLIP is a vision-language model developed by OpenAI that maps images and text into a shared embedding space through contrastive learning, supporting zero-shot image classification.
Image-to-Text
C
openai
44.7M
1,710
Phi 2 GGUF
Other
Phi-2 is a small yet powerful language model developed by Microsoft, featuring 2.7 billion parameters, focusing on efficient inference and high-quality text generation.
Large Language Model Supports Multiple Languages
P
TheBloke
41.5M
205
Chronos T5 Small
Apache-2.0
Chronos is a family of pre-trained time series forecasting models based on language model architectures. It converts time series into token sequences through quantization and scaling for training, suitable for probabilistic forecasting tasks.
Climate Model Transformers
C
amazon
22.8M
66
Roberta Large
MIT
A large English language model pre-trained with masked language modeling objectives, using improved BERT training methods
Large Language Model English
R
FacebookAI
19.4M
212
Clip Vit Base Patch32
CLIP is a multimodal model developed by OpenAI that can understand the relationship between images and text, supporting zero-shot image classification tasks.
Image-to-Text
C
openai
14.0M
666
Segmentation 3.0
MIT
This is a powerset-encoded speaker diarization model capable of processing 10-second audio clips to identify multiple speakers and their overlapping speech.
Speaker Analysis
S
pyannote
12.6M
445
Speaker Diarization 3.1
MIT
An audio processing model for speaker segmentation that can automatically detect and segment different speakers in audio.
Speaker Analysis
S
pyannote
11.7M
822
Distilbert Base Uncased
Apache-2.0
DistilBERT is a distilled version of the BERT base model, maintaining similar performance while being more lightweight and efficient, suitable for natural language processing tasks such as sequence classification and token classification.
Large Language Model English
D
distilbert
11.1M
669
Clipseg Rd64 Refined
Apache-2.0
CLIPSeg is an image segmentation model based on text and image prompts, supporting zero-shot and one-shot image segmentation tasks.
Image Segmentation Transformers
C
CIDAS
10.0M
122
Llama 3.1 8B Instruct GGUF
Meta Llama 3.1 8B Instruct is a multilingual large language model optimized for multilingual dialogue use cases, excelling in common industry benchmarks.
Large Language Model English
L
modularai
9.7M
4
Xlm Roberta Base
MIT
XLM-RoBERTa is a multilingual model pretrained on 2.5TB of filtered CommonCrawl data across 100 languages, using masked language modeling as the training objective.
Large Language Model Supports Multiple Languages
X
FacebookAI
9.6M
664
Roberta Base
MIT
An English pre-trained model based on Transformer architecture, trained on massive text through masked language modeling objectives, supporting text feature extraction and downstream task fine-tuning
Large Language Model English
R
FacebookAI
9.3M
488
Segmentation
MIT
An audio processing model for voice activity detection, overlap detection, and speaker diarization
Speaker Analysis
S
pyannote
9.2M
579
Vit Face Expression
Apache-2.0
A facial emotion recognition model fine-tuned based on Vision Transformer (ViT), supporting 7 expression classifications
Face-related Transformers
V
trpakov
9.2M
66
Voice Activity Detection
MIT
Voice activity detection model based on pyannote.audio 2.1, used to identify speech activity segments in audio
Speech Recognition
V
pyannote
7.7M
181
Opt 125m
Other
OPT is an open pre-trained Transformer language model suite released by Meta AI, with parameter sizes ranging from 125 million to 175 billion, designed to match the performance of the GPT-3 series while promoting open research in large-scale language models.
Large Language Model English
O
facebook
6.3M
198
Chronos Bolt Small
Apache-2.0
Chronos-Bolt is a series of pretrained time series foundation models based on the T5 architecture, achieving efficient time series forecasting through innovative chunk encoding and direct multi-step prediction
Climate Model
C
autogluon
6.2M
13
1
A pretrained model based on the transformers library, suitable for various NLP tasks
Large Language Model Transformers
1
unslothai
6.2M
1
Siglip So400m Patch14 384
Apache-2.0
SigLIP is a vision-language model pre-trained on the WebLi dataset, employing an improved sigmoid loss function to optimize image-text matching tasks.
Image-to-Text Transformers
S
google
6.1M
526
Clip Vit Large Patch14 336
A large-scale vision-language pretrained model based on the Vision Transformer architecture, supporting cross-modal understanding between images and text
Text-to-Image Transformers
C
openai
5.9M
241
Llama 3.1 8B Instruct
Llama 3.1 is Meta's multilingual large language model series, featuring 8B, 70B, and 405B parameter scales, supporting 8 languages and code generation, with optimized multilingual dialogue scenarios.
Large Language Model Transformers Supports Multiple Languages
L
meta-llama
5.7M
3,898
T5 Base
Apache-2.0
The T5 Base Version is a text-to-text Transformer model developed by Google with 220 million parameters, supporting multilingual NLP tasks.
Large Language Model Supports Multiple Languages
T
google-t5
5.4M
702
Xlm Roberta Large
MIT
XLM-RoBERTa is a multilingual model pretrained on 2.5TB of filtered CommonCrawl data across 100 languages, trained with a masked language modeling objective.
Large Language Model Supports Multiple Languages
X
FacebookAI
5.3M
431
Distilbert Base Uncased Finetuned Sst 2 English
Apache-2.0
Text classification model fine-tuned on the SST-2 sentiment analysis dataset based on DistilBERT-base-uncased, with 91.3% accuracy
Text Classification English
D
distilbert
5.2M
746
Dinov2 Small
Apache-2.0
A small-scale vision Transformer model trained using the DINOv2 method, extracting image features through self-supervised learning
Image Classification Transformers
D
facebook
5.0M
31
Wav2vec2 Large Xlsr 53 Portuguese
Apache-2.0
This is a fine-tuned XLSR-53 large model for Portuguese speech recognition tasks, trained on the Common Voice 6.1 dataset, supporting Portuguese speech-to-text conversion.
Speech Recognition Other
W
jonatasgrosman
4.9M
32
Vit Base Patch16 224
Apache-2.0
Vision Transformer model pre-trained on ImageNet-21k and fine-tuned on ImageNet for image classification tasks
Image Classification
V
google
4.8M
775
Chronos Bolt Base
Apache-2.0
Chronos-Bolt is a series of pretrained time series forecasting models that support zero-shot prediction with high accuracy and fast inference speed.
Climate Model
C
autogluon
4.7M
22
Whisper Large V3
Apache-2.0
Whisper is an advanced automatic speech recognition (ASR) and speech translation model proposed by OpenAI, trained on over 5 million hours of labeled data, with strong cross-dataset and cross-domain generalization capabilities.
Speech Recognition Supports Multiple Languages
W
openai
4.6M
4,321
Clip Vit Base Patch16
CLIP is a multimodal model developed by OpenAI that maps images and text into a shared embedding space through contrastive learning, enabling zero-shot image classification capabilities.
Image-to-Text
C
openai
4.6M
119
Whisper Large V3 Turbo
MIT
Whisper is a state-of-the-art automatic speech recognition (ASR) and speech translation model developed by OpenAI, trained on over 5 million hours of labeled data, demonstrating strong generalization capabilities in zero-shot settings.
Speech Recognition Transformers Supports Multiple Languages
W
openai
4.0M
2,317
Wav2vec2 Large Xlsr 53 Russian
Apache-2.0
A Russian speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampled audio input
Speech Recognition Other
W
jonatasgrosman
3.9M
54
Bart Large Cnn
MIT
BART model pre-trained on English corpus, specifically fine-tuned for the CNN/Daily Mail dataset, suitable for text summarization tasks
Text Generation English
B
facebook
3.8M
1,364
Wav2vec2 Large Xlsr 53 Chinese Zh Cn
Apache-2.0
A Chinese speech recognition model fine-tuned based on facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampling rate audio input.
Speech Recognition Chinese
W
jonatasgrosman
3.8M
110
Fashion Clip
MIT
FashionCLIP is a vision-language model fine-tuned specifically for the fashion domain based on CLIP, capable of generating universal product representations.
Text-to-Image Transformers English
F
patrickjohncyh
3.8M
222
Jina Embeddings V3
Jina Embeddings V3 is a multilingual sentence embedding model supporting over 100 languages, specializing in sentence similarity and feature extraction tasks.
Text Embedding Transformers Supports Multiple Languages
J
jinaai
3.7M
911
Stable Diffusion V1 5
Openrail
Stable Diffusion is a latent text-to-image diffusion model capable of generating realistic images from any text input.
Image Generation
S
stable-diffusion-v1-5
3.7M
518
Bart Large Mnli
MIT
Zero-shot classification model based on BART-large architecture, fine-tuned on MultiNLI dataset
Large Language Model
B
facebook
3.7M
1,364
T5 Small
Apache-2.0
T5-Small is a 60-million-parameter text transformation model developed by Google, using a unified text-to-text framework to handle various NLP tasks
Large Language Model Supports Multiple Languages
T
google-t5
3.7M
450
Esm2 T36 3B UR50D
MIT
ESM-2 is a next-generation protein model trained with masked language modeling objectives, suitable for fine-tuning on various downstream tasks with protein sequences as input.
Protein Model Transformers
E
facebook
3.5M
22
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase