# Enhanced Semantic Understanding

Vit So400m Patch16 Siglip Gap 256.v2 Webli
Apache-2.0
ViT image encoder based on SigLIP 2, using global average pooling, with attention pooling head removed, suitable for image feature extraction tasks.
Text-to-Image Transformers
V
timm
22
0
Vit So400m Patch16 Siglip 512.v2 Webli
Apache-2.0
A vision Transformer model based on SigLIP 2, designed for image feature extraction and suitable for multilingual vision-language tasks.
Text-to-Image Transformers
V
timm
2,766
0
Vit So400m Patch16 Siglip 256.v2 Webli
Apache-2.0
SigLIP 2 ViT model, containing only the image encoder part for image feature extraction, trained on the WebLI dataset.
Text-to-Image Transformers
V
timm
12.56k
0
Vit So400m Patch14 Siglip 224.v2 Webli
Apache-2.0
A Vision Transformer model based on SigLIP 2 architecture, designed for image feature extraction and pretrained on the webli dataset.
Image Classification Transformers
V
timm
7,005
0
Vit Large Patch16 Siglip 512.v2 Webli
Apache-2.0
ViT image encoder based on SigLIP 2, designed for timm, suitable for vision-language tasks
Image Classification Transformers
V
timm
295
0
Vit Large Patch16 Siglip 384.v2 Webli
Apache-2.0
A vision Transformer model based on the SigLIP 2 architecture, designed for image feature extraction, pretrained on the webli dataset
Text-to-Image Transformers
V
timm
4,265
0
Vit Giantopt Patch16 Siglip 256.v2 Webli
Apache-2.0
Vision Transformer model based on SigLIP 2 technology, focused on image feature extraction
Text-to-Image Transformers
V
timm
59
0
Vit Base Patch16 Siglip 512.v2 Webli
Apache-2.0
Vision Transformer model based on SigLIP 2, designed for image feature extraction, pre-trained on the webli dataset
Text-to-Image Transformers
V
timm
2,664
0
Siglip2 So400m Patch16 Naflex
Apache-2.0
SigLIP 2 is an improved model based on the SigLIP pre-training objective, integrating multiple technologies to enhance semantic understanding, localization, and dense feature extraction capabilities.
Text-to-Image Transformers
S
google
159.81k
21
Siglip2 Large Patch16 256
Apache-2.0
SigLIP 2 is an improved vision-language model based on SigLIP, integrating multiple technologies to enhance semantic understanding, localization, and dense feature extraction capabilities.
Text-to-Image Transformers
S
google
10.89k
3
Siglip2 Base Patch16 512
Apache-2.0
SigLIP 2 is a vision-language model that integrates multiple technologies to enhance semantic understanding, localization, and dense feature extraction capabilities.
Text-to-Image Transformers
S
google
28.01k
10
Siglip2 Base Patch16 256
Apache-2.0
SigLIP 2 is a multilingual vision-language encoder with improved semantic understanding, localization, and dense feature extraction capabilities.
Image-to-Text Transformers
S
google
45.24k
4
Siglip2 Base Patch32 256
Apache-2.0
SigLIP 2 is an improved version of SigLIP, integrating multiple technologies to enhance semantic understanding, localization, and dense feature extraction capabilities.
Text-to-Image Transformers
S
google
9,419
4
Ernie 2.0 Base En
ERNIE 2.0 is a continuous pre-training framework proposed by Baidu in 2019, which gradually constructs and optimizes pre-training tasks through continuous multi-task learning. It outperforms BERT and XLNet in multiple tasks.
Large Language Model Transformers English
E
nghuyong
1,694
15
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase