
Access Global AI Models - Power Next-Gen Apps
From General to Specialized AI - All Models in One Platform
Filter
Classification
No LimitLarge Language ModelImage GenerationSpeech RecognitionText EmbeddingImage ClassificationText ClassificationMachine TranslationText-to-ImageImage-to-TextSequence LabelingText GenerationSpeech SynthesisQuestion Answering SystemImage SegmentationAudio ClassificationObject DetectionText-to-VideoText RecognitionVideo Processing3D VisionMultimodal FusionMolecular ModelVideo-to-TextDialogue SystemProtein ModelAudio GenerationClimate ModelText-to-AudioPhysics ModelImage EnhancementKnowledge GraphFace-relatedAudio-to-TextPose EstimationSpeaker AnalysisSound SeparationMultimodal AlignmentAudio EnhancementMaterials Science
Framework
No LimitTransformersPyTorchSafetensorsTensorBoardOther
Open Source License
No LimitApache-2.0MITCCOpenrailGpl-3.0Bsd-3-clauseOther
Language
No LimitEnglishChineseSpanishArabicFrenchGermanJapaneseKoreanOther
11789 models match the criteria
Nsfw Image Detection
Apache-2.0
An NSFW image classification model based on the ViT architecture, pre-trained on ImageNet-21k via supervised learning and fine-tuned on 80,000 images to distinguish between normal and NSFW content.
Image Classification
Transformers

N
Falconsai
82.4M
588
Clip Vit Large Patch14
CLIP is a vision-language model developed by OpenAI that maps images and text into a shared embedding space through contrastive learning, supporting zero-shot image classification.
Image-to-Text
C
openai
44.7M
1,710
Roberta Large
MIT
A large English language model pre-trained with masked language modeling objectives, using improved BERT training methods
Large Language Model English
R
FacebookAI
19.4M
212
Clip Vit Base Patch32
CLIP is a multimodal model developed by OpenAI that can understand the relationship between images and text, supporting zero-shot image classification tasks.
Image-to-Text
C
openai
14.0M
666
Segmentation 3.0
MIT
This is a powerset-encoded speaker diarization model capable of processing 10-second audio clips to identify multiple speakers and their overlapping speech.
Speaker Analysis
S
pyannote
12.6M
445
Distilbert Base Uncased
Apache-2.0
DistilBERT is a distilled version of the BERT base model, maintaining similar performance while being more lightweight and efficient, suitable for natural language processing tasks such as sequence classification and token classification.
Large Language Model English
D
distilbert
11.1M
669
Clipseg Rd64 Refined
Apache-2.0
CLIPSeg is an image segmentation model based on text and image prompts, supporting zero-shot and one-shot image segmentation tasks.
Image Segmentation
Transformers

C
CIDAS
10.0M
122
Llama 3.1 8B Instruct GGUF
Meta Llama 3.1 8B Instruct is a multilingual large language model optimized for multilingual dialogue use cases, excelling in common industry benchmarks.
Large Language Model English
L
modularai
9.7M
4
Xlm Roberta Base
MIT
XLM-RoBERTa is a multilingual model pretrained on 2.5TB of filtered CommonCrawl data across 100 languages, using masked language modeling as the training objective.
Large Language Model Supports Multiple Languages
X
FacebookAI
9.6M
664
Roberta Base
MIT
An English pre-trained model based on Transformer architecture, trained on massive text through masked language modeling objectives, supporting text feature extraction and downstream task fine-tuning
Large Language Model English
R
FacebookAI
9.3M
488
Segmentation
MIT
An audio processing model for voice activity detection, overlap detection, and speaker diarization
Speaker Analysis
S
pyannote
9.2M
579
Vit Face Expression
Apache-2.0
A facial emotion recognition model fine-tuned based on Vision Transformer (ViT), supporting 7 expression classifications
Face-related
Transformers

V
trpakov
9.2M
66
Opt 125m
Other
OPT is an open pre-trained Transformer language model suite released by Meta AI, with parameter sizes ranging from 125 million to 175 billion, designed to match the performance of the GPT-3 series while promoting open research in large-scale language models.
Large Language Model English
O
facebook
6.3M
198
Clip Vit Large Patch14 336
A large-scale vision-language pretrained model based on the Vision Transformer architecture, supporting cross-modal understanding between images and text
Text-to-Image
Transformers

C
openai
5.9M
241
Llama 3.1 8B Instruct
Llama 3.1 is Meta's multilingual large language model series, featuring 8B, 70B, and 405B parameter scales, supporting 8 languages and code generation, with optimized multilingual dialogue scenarios.
Large Language Model
Transformers Supports Multiple Languages

L
meta-llama
5.7M
3,898
T5 Base
Apache-2.0
The T5 Base Version is a text-to-text Transformer model developed by Google with 220 million parameters, supporting multilingual NLP tasks.
Large Language Model Supports Multiple Languages
T
google-t5
5.4M
702
Xlm Roberta Large
MIT
XLM-RoBERTa is a multilingual model pretrained on 2.5TB of filtered CommonCrawl data across 100 languages, trained with a masked language modeling objective.
Large Language Model Supports Multiple Languages
X
FacebookAI
5.3M
431
Distilbert Base Uncased Finetuned Sst 2 English
Apache-2.0
Text classification model fine-tuned on the SST-2 sentiment analysis dataset based on DistilBERT-base-uncased, with 91.3% accuracy
Text Classification English
D
distilbert
5.2M
746
Dinov2 Small
Apache-2.0
A small-scale vision Transformer model trained using the DINOv2 method, extracting image features through self-supervised learning
Image Classification
Transformers

D
facebook
5.0M
31
Wav2vec2 Large Xlsr 53 Portuguese
Apache-2.0
This is a fine-tuned XLSR-53 large model for Portuguese speech recognition tasks, trained on the Common Voice 6.1 dataset, supporting Portuguese speech-to-text conversion.
Speech Recognition Other
W
jonatasgrosman
4.9M
32
Vit Base Patch16 224
Apache-2.0
Vision Transformer model pre-trained on ImageNet-21k and fine-tuned on ImageNet for image classification tasks
Image Classification
V
google
4.8M
775
Whisper Large V3
Apache-2.0
Whisper is an advanced automatic speech recognition (ASR) and speech translation model proposed by OpenAI, trained on over 5 million hours of labeled data, with strong cross-dataset and cross-domain generalization capabilities.
Speech Recognition Supports Multiple Languages
W
openai
4.6M
4,321
Clip Vit Base Patch16
CLIP is a multimodal model developed by OpenAI that maps images and text into a shared embedding space through contrastive learning, enabling zero-shot image classification capabilities.
Image-to-Text
C
openai
4.6M
119
Wav2vec2 Large Xlsr 53 Russian
Apache-2.0
A Russian speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampled audio input
Speech Recognition Other
W
jonatasgrosman
3.9M
54
Bart Large Cnn
MIT
BART model pre-trained on English corpus, specifically fine-tuned for the CNN/Daily Mail dataset, suitable for text summarization tasks
Text Generation English
B
facebook
3.8M
1,364
Wav2vec2 Large Xlsr 53 Chinese Zh Cn
Apache-2.0
A Chinese speech recognition model fine-tuned based on facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampling rate audio input.
Speech Recognition Chinese
W
jonatasgrosman
3.8M
110
Fashion Clip
MIT
FashionCLIP is a vision-language model fine-tuned specifically for the fashion domain based on CLIP, capable of generating universal product representations.
Text-to-Image
Transformers English

F
patrickjohncyh
3.8M
222
Jina Embeddings V3
Jina Embeddings V3 is a multilingual sentence embedding model supporting over 100 languages, specializing in sentence similarity and feature extraction tasks.
Text Embedding
Transformers Supports Multiple Languages

J
jinaai
3.7M
911
Bart Large Mnli
MIT
Zero-shot classification model based on BART-large architecture, fine-tuned on MultiNLI dataset
Large Language Model
B
facebook
3.7M
1,364
T5 Small
Apache-2.0
T5-Small is a 60-million-parameter text transformation model developed by Google, using a unified text-to-text framework to handle various NLP tasks
Large Language Model Supports Multiple Languages
T
google-t5
3.7M
450
Esm2 T36 3B UR50D
MIT
ESM-2 is a next-generation protein model trained with masked language modeling objectives, suitable for fine-tuning on various downstream tasks with protein sequences as input.
Protein Model
Transformers

E
facebook
3.5M
22
Flan T5 Base
Apache-2.0
FLAN-T5 is a language model optimized through instruction fine-tuning based on the T5 model, supporting multilingual task processing and outperforming the original T5 model with the same parameter count.
Large Language Model Supports Multiple Languages
F
google
3.3M
862
Albert Base V2
Apache-2.0
ALBERT is a lightweight pre-trained language model based on Transformer architecture, reducing memory usage through parameter sharing mechanism, suitable for English text processing tasks.
Large Language Model English
A
albert
3.1M
121
Wav2vec2 Large Xlsr 53 Dutch
Apache-2.0
A Dutch speech recognition model fine-tuned based on facebook/wav2vec2-large-xlsr-53, trained on the Common Voice and CSS10 datasets, supporting 16kHz audio input.
Speech Recognition Other
W
jonatasgrosman
3.0M
12
Wav2vec2 Large Xlsr 53 Japanese
Apache-2.0
Japanese speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampling rate audio input
Speech Recognition Japanese
W
jonatasgrosman
2.9M
33
Blip Image Captioning Base
Bsd-3-clause
BLIP is an advanced vision-language pretrained model, excelling in image captioning tasks and supporting both conditional and unconditional text generation.
Image-to-Text
Transformers

B
Salesforce
2.8M
688
Distilbert Base Multilingual Cased
Apache-2.0
DistilBERT is a distilled version of the BERT base multilingual model, retaining 97% of BERT's performance with fewer parameters and faster speed. It supports 104 languages and is suitable for various natural language processing tasks.
Large Language Model
Transformers Supports Multiple Languages

D
distilbert
2.8M
187
Distilgpt2
Apache-2.0
DistilGPT2 is a lightweight distilled version of GPT-2 with 82 million parameters, retaining GPT-2's core text generation capabilities while being smaller and faster.
Large Language Model English
D
distilbert
2.7M
527
Xlm Roberta Base Language Detection
MIT
Multilingual detection model based on XLM-RoBERTa, supporting text classification in 20 languages
Text Classification
Transformers Supports Multiple Languages

X
papluca
2.7M
333
BLEURT 20 D12
The BLEURT model implemented based on PyTorch, used for text evaluation tasks in natural language processing.
Large Language Model
Transformers

B
lucadiliello
2.6M
1
Table Transformer Detection
MIT
A table detection model based on the DETR architecture, specifically designed for extracting tables from unstructured documents
Object Detection
Transformers

T
microsoft
2.6M
349
Blip Image Captioning Large
Bsd-3-clause
BLIP is a unified vision-language pretraining framework, excelling at image caption generation tasks, supporting both conditional and unconditional image caption generation.
Image-to-Text
Transformers

B
Salesforce
2.5M
1,312