
Access Global AI Models - Power Next-Gen Apps
From General to Specialized AI - All Models in One Platform
Filter
Classification
No LimitLarge Language ModelImage GenerationSpeech RecognitionText EmbeddingImage ClassificationText ClassificationMachine TranslationText-to-ImageImage-to-TextSequence LabelingText GenerationSpeech SynthesisQuestion Answering SystemImage SegmentationAudio ClassificationObject DetectionText-to-VideoText RecognitionVideo Processing3D VisionMultimodal FusionMolecular ModelVideo-to-TextDialogue SystemProtein ModelAudio GenerationClimate ModelText-to-AudioPhysics ModelImage EnhancementKnowledge GraphFace-relatedAudio-to-TextPose EstimationSpeaker AnalysisSound SeparationMultimodal AlignmentAudio EnhancementMaterials Science
Framework
No LimitTransformersPyTorchSafetensorsTensorBoardOther
Open Source License
No LimitApache-2.0MITCCOpenrailGpl-3.0Bsd-3-clauseOther
Language
No LimitEnglishChineseSpanishArabicFrenchGermanJapaneseKoreanOther
9994 models match the criteria
Nsfw Image Detection
Apache-2.0
An NSFW image classification model based on the ViT architecture, pre-trained on ImageNet-21k via supervised learning and fine-tuned on 80,000 images to distinguish between normal and NSFW content.
Image Classification
Transformers

N
Falconsai
82.4M
588
Fairface Age Image Detection
Apache-2.0
An image classification model based on Vision Transformer architecture, pre-trained on the ImageNet-21k dataset, suitable for multi-category image classification tasks
Image Classification
Transformers

F
dima806
76.6M
10
Clip Vit Large Patch14
CLIP is a vision-language model developed by OpenAI that maps images and text into a shared embedding space through contrastive learning, supporting zero-shot image classification.
Image-to-Text
C
openai
44.7M
1,710
Chronos T5 Small
Apache-2.0
Chronos is a family of pre-trained time series forecasting models based on language model architectures. It converts time series into token sequences through quantization and scaling for training, suitable for probabilistic forecasting tasks.
Climate Model
Transformers

C
amazon
22.8M
66
Roberta Large
MIT
A large English language model pre-trained with masked language modeling objectives, using improved BERT training methods
Large Language Model English
R
FacebookAI
19.4M
212
Distilbert Base Uncased
Apache-2.0
DistilBERT is a distilled version of the BERT base model, maintaining similar performance while being more lightweight and efficient, suitable for natural language processing tasks such as sequence classification and token classification.
Large Language Model English
D
distilbert
11.1M
669
Clipseg Rd64 Refined
Apache-2.0
CLIPSeg is an image segmentation model based on text and image prompts, supporting zero-shot and one-shot image segmentation tasks.
Image Segmentation
Transformers

C
CIDAS
10.0M
122
Xlm Roberta Base
MIT
XLM-RoBERTa is a multilingual model pretrained on 2.5TB of filtered CommonCrawl data across 100 languages, using masked language modeling as the training objective.
Large Language Model Supports Multiple Languages
X
FacebookAI
9.6M
664
Roberta Base
MIT
An English pre-trained model based on Transformer architecture, trained on massive text through masked language modeling objectives, supporting text feature extraction and downstream task fine-tuning
Large Language Model English
R
FacebookAI
9.3M
488
Vit Face Expression
Apache-2.0
A facial emotion recognition model fine-tuned based on Vision Transformer (ViT), supporting 7 expression classifications
Face-related
Transformers

V
trpakov
9.2M
66
Chronos Bolt Small
Apache-2.0
Chronos-Bolt is a series of pretrained time series foundation models based on the T5 architecture, achieving efficient time series forecasting through innovative chunk encoding and direct multi-step prediction
Climate Model
C
autogluon
6.2M
13
1
A pretrained model based on the transformers library, suitable for various NLP tasks
Large Language Model
Transformers

1
unslothai
6.2M
1
Siglip So400m Patch14 384
Apache-2.0
SigLIP is a vision-language model pre-trained on the WebLi dataset, employing an improved sigmoid loss function to optimize image-text matching tasks.
Image-to-Text
Transformers

S
google
6.1M
526
Llama 3.1 8B Instruct
Llama 3.1 is Meta's multilingual large language model series, featuring 8B, 70B, and 405B parameter scales, supporting 8 languages and code generation, with optimized multilingual dialogue scenarios.
Large Language Model
Transformers Supports Multiple Languages

L
meta-llama
5.7M
3,898
T5 Base
Apache-2.0
The T5 Base Version is a text-to-text Transformer model developed by Google with 220 million parameters, supporting multilingual NLP tasks.
Large Language Model Supports Multiple Languages
T
google-t5
5.4M
702
Xlm Roberta Large
MIT
XLM-RoBERTa is a multilingual model pretrained on 2.5TB of filtered CommonCrawl data across 100 languages, trained with a masked language modeling objective.
Large Language Model Supports Multiple Languages
X
FacebookAI
5.3M
431
Distilbert Base Uncased Finetuned Sst 2 English
Apache-2.0
Text classification model fine-tuned on the SST-2 sentiment analysis dataset based on DistilBERT-base-uncased, with 91.3% accuracy
Text Classification English
D
distilbert
5.2M
746
Dinov2 Small
Apache-2.0
A small-scale vision Transformer model trained using the DINOv2 method, extracting image features through self-supervised learning
Image Classification
Transformers

D
facebook
5.0M
31
Vit Base Patch16 224
Apache-2.0
Vision Transformer model pre-trained on ImageNet-21k and fine-tuned on ImageNet for image classification tasks
Image Classification
V
google
4.8M
775
Chronos Bolt Base
Apache-2.0
Chronos-Bolt is a series of pretrained time series forecasting models that support zero-shot prediction with high accuracy and fast inference speed.
Climate Model
C
autogluon
4.7M
22
Whisper Large V3
Apache-2.0
Whisper is an advanced automatic speech recognition (ASR) and speech translation model proposed by OpenAI, trained on over 5 million hours of labeled data, with strong cross-dataset and cross-domain generalization capabilities.
Speech Recognition Supports Multiple Languages
W
openai
4.6M
4,321
Whisper Large V3 Turbo
MIT
Whisper is a state-of-the-art automatic speech recognition (ASR) and speech translation model developed by OpenAI, trained on over 5 million hours of labeled data, demonstrating strong generalization capabilities in zero-shot settings.
Speech Recognition
Transformers Supports Multiple Languages

W
openai
4.0M
2,317
Bart Large Cnn
MIT
BART model pre-trained on English corpus, specifically fine-tuned for the CNN/Daily Mail dataset, suitable for text summarization tasks
Text Generation English
B
facebook
3.8M
1,364
Fashion Clip
MIT
FashionCLIP is a vision-language model fine-tuned specifically for the fashion domain based on CLIP, capable of generating universal product representations.
Text-to-Image
Transformers English

F
patrickjohncyh
3.8M
222
Jina Embeddings V3
Jina Embeddings V3 is a multilingual sentence embedding model supporting over 100 languages, specializing in sentence similarity and feature extraction tasks.
Text Embedding
Transformers Supports Multiple Languages

J
jinaai
3.7M
911
Stable Diffusion V1 5
Openrail
Stable Diffusion is a latent text-to-image diffusion model capable of generating realistic images from any text input.
Image Generation
S
stable-diffusion-v1-5
3.7M
518
Bart Large Mnli
MIT
Zero-shot classification model based on BART-large architecture, fine-tuned on MultiNLI dataset
Large Language Model
B
facebook
3.7M
1,364
T5 Small
Apache-2.0
T5-Small is a 60-million-parameter text transformation model developed by Google, using a unified text-to-text framework to handle various NLP tasks
Large Language Model Supports Multiple Languages
T
google-t5
3.7M
450
Flan T5 Base
Apache-2.0
FLAN-T5 is a language model optimized through instruction fine-tuning based on the T5 model, supporting multilingual task processing and outperforming the original T5 model with the same parameter count.
Large Language Model Supports Multiple Languages
F
google
3.3M
862
Albert Base V2
Apache-2.0
ALBERT is a lightweight pre-trained language model based on Transformer architecture, reducing memory usage through parameter sharing mechanism, suitable for English text processing tasks.
Large Language Model English
A
albert
3.1M
121
Distilbert Base Multilingual Cased
Apache-2.0
DistilBERT is a distilled version of the BERT base multilingual model, retaining 97% of BERT's performance with fewer parameters and faster speed. It supports 104 languages and is suitable for various natural language processing tasks.
Large Language Model
Transformers Supports Multiple Languages

D
distilbert
2.8M
187
Distilgpt2
Apache-2.0
DistilGPT2 is a lightweight distilled version of GPT-2 with 82 million parameters, retaining GPT-2's core text generation capabilities while being smaller and faster.
Large Language Model English
D
distilbert
2.7M
527
Xlm Roberta Base Language Detection
MIT
Multilingual detection model based on XLM-RoBERTa, supporting text classification in 20 languages
Text Classification
Transformers Supports Multiple Languages

X
papluca
2.7M
333
Table Transformer Detection
MIT
A table detection model based on the DETR architecture, specifically designed for extracting tables from unstructured documents
Object Detection
Transformers

T
microsoft
2.6M
349
Blip Image Captioning Large
Bsd-3-clause
BLIP is a unified vision-language pretraining framework, excelling at image caption generation tasks, supporting both conditional and unconditional image caption generation.
Image-to-Text
Transformers

B
Salesforce
2.5M
1,312
Ms Marco MiniLM L6 V2
Apache-2.0
A cross-encoder model trained on the MS Marco passage ranking task for query-passage relevance scoring in information retrieval
Text Embedding English
M
cross-encoder
2.5M
86
Mms 300m 1130 Forced Aligner
A text-to-audio forced alignment tool based on Hugging Face pre-trained models, supporting multiple languages with high memory efficiency
Speech Recognition
Transformers Supports Multiple Languages

M
MahmoudAshraf
2.5M
50
Llama 3.2 1B Instruct
Llama 3.2 is a multilingual large language model series developed by Meta, including 1B and 3B scale pre-trained and instruction-tuned generative models, optimized for multilingual dialogue scenarios, supporting intelligent retrieval and summarization tasks.
Large Language Model
Transformers Supports Multiple Languages

L
meta-llama
2.4M
901
Stable Diffusion Xl Base 1.0
SDXL 1.0 is a diffusion-based text-to-image generation model that employs an expert-integrated latent diffusion process, supporting high-resolution image generation
Image Generation
S
stabilityai
2.4M
6,545
Qwen2.5 0.5B Instruct
Apache-2.0
A 0.5B parameter instruction fine-tuned model designed for the Gensyn reinforcement learning group, supporting local fine-tuning training
Large Language Model
Transformers English

Q
Gensyn
2.4M
5
Vit Base Patch16 224 In21k
Apache-2.0
A Vision Transformer model pretrained on the ImageNet-21k dataset for image classification tasks.
Image Classification
V
google
2.2M
323
Indonesian Roberta Base Posp Tagger
MIT
This is a POS tagging model fine-tuned based on the Indonesian RoBERTa model, trained on the indonlu dataset for Indonesian text POS tagging tasks.
Sequence Labeling
Transformers Other

I
w11wo
2.2M
7