# Image semantic encoding
Resnet50 Clip Gap.cc12m
Apache-2.0
CLIP-style image encoder based on ResNet50 architecture, trained on CC12M dataset, extracting features through Global Average Pooling (GAP)
Image Classification
Transformers

R
timm
19
0
Vit Large Patch16 224.mae
Large-scale image feature extraction model based on Vision Transformer (ViT), pre-trained on ImageNet-1k dataset using self-supervised Masked Autoencoder (MAE) method
Image Classification
Transformers

V
timm
960
1
Vit Base Patch16 224.mae
Vision Transformer (ViT) based image feature extraction model, pre-trained on ImageNet-1k dataset using self-supervised masked autoencoder (MAE) method
Image Classification
Transformers

V
timm
23.63k
2
Featured Recommended AI Models